Introduction
Tree is hierarchical data structure with nodes connected by edges. Root is topmost node, leaves are nodes with no children. Trees model hierarchies: file systems, organizational charts, parse trees, inheritance hierarchies. Fundamental to computer science: enable efficient searching, sorting, organizing information.
Key property: acyclic (no cycles, path from root to any node unique). Contrasts with graphs (which allow cycles). Special case: connected acyclic graph with n nodes has exactly n-1 edges.
Binary trees: each node at most 2 children. Restricts structure, enables efficient algorithms. Binary search trees: left children < parent < right children. Enable O(log n) search if balanced.
"Trees are nature's solution to organization. Hierarchical structure enables efficient navigation and understanding of complex systems." -- Data structure design principles
Tree Terminology and Properties
Key Terms
- Root: Topmost node (no parent)
- Parent: Node with child
- Child: Node pointed to by parent
- Leaf: Node with no children
- Depth: Distance from root (root depth = 0)
- Height: Distance to farthest leaf
- Subtree: Node and all descendants
Properties
N-node tree has exactly n-1 edges (connectivity). Path from root to any node unique. Height determines depth of recursion (worst case O(height) traversal).
Perfect, Complete, Full Binary Trees
Perfect: All leaves at same depth. Full binary tree with all levels filled. Height h: exactly 2^(h+1) - 1 nodes.
Complete: All levels filled except possibly last (packed left). Height approximately log n.
Full: Every node has 0 or 2 children (no nodes with 1 child).
Binary Trees
Node Structure
class TreeNode:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
Properties
Maximum n-node height: n (linear tree = linked list). Minimum height: ceil(log n) (balanced tree). Height determines algorithm efficiency.
Counting Nodes
def count_nodes(node):
if node is None:
return 0
return 1 + count_nodes(node.left) + count_nodes(node.right)
Time: O(n), Space: O(height)
Computing Height
def height(node):
if node is None:
return -1 # or 0 depending on convention
return 1 + max(height(node.left), height(node.right))
Time: O(n), Space: O(height)
Binary Search Trees (BSTs)
BST Property
Left subtree all values < node. Right subtree all values > node. Enables binary search: decision at each node determines search direction.
Search Algorithm
def search(node, value):
if node is None:
return None
if value == node.value:
return node
elif value < node.value:
return search(node.left, value)
else:
return search(node.right, value)
Time: O(height), O(log n) if balanced, O(n) if linear
Minimum and Maximum
def min_node(node):
while node.left:
node = node.left
return node
def max_node(node):
while node.right:
node = node.right
return node
In-Order Traversal Gives Sorted Sequence
Left-Node-Right traversal outputs values in sorted order. Useful for accessing BST in sorted order.
BST Operations: Search, Insert, Delete
Insert
def insert(node, value):
if node is None:
return TreeNode(value)
if value < node.value:
node.left = insert(node.left, value)
elif value > node.value:
node.right = insert(node.right, value)
# Skip if equal (no duplicates)
return node
Time: O(height)
Delete (Complex)
Three cases: (1) Leaf: just remove. (2) One child: replace with child. (3) Two children: replace with successor (smallest in right subtree), then delete successor.
def delete(node, value):
if node is None:
return None
if value < node.value:
node.left = delete(node.left, value)
elif value > node.value:
node.right = delete(node.right, value)
else: # Found node to delete
if node.left is None:
return node.right
elif node.right is None:
return node.left
else: # Two children
successor = min_node(node.right)
node.value = successor.value
node.right = delete(node.right, successor.value)
return node
Time: O(height)
Unbalanced Problem
Worst case: sorted input creates linear tree. Height = n. Operations O(n) instead of O(log n). Solution: keep tree balanced (AVL, Red-Black).
Balanced Trees: AVL and Red-Black
AVL Trees
Height-balanced: every node's left and right subtrees differ in height by at most 1. Guarantees O(log n) height. Requires rebalancing on insert/delete via rotations.
Balance factor = height(left) - height(right)
Valid: -1, 0, +1
Outside: requires rotation
Rotations
Left rotation, right rotation restructure tree locally, maintain BST property, restore balance. Amortized: O(log n) per operation (rare rotations).
Red-Black Trees
Alternative to AVL. Each node red or black. Rules: (1) Root black. (2) Leaves (None) black. (3) Red node has black children. (4) All paths to leaves same black count. Guarantees O(log n) height, fewer rotations than AVL.
Practical Performance
AVL: more rigidly balanced, faster search. Red-Black: fewer rotations, faster insert/delete. Most systems use Red-Black (C++ std::map uses Red-Black).
B-Trees for Disk Storage
Motivation
Binary trees optimize for memory (all comparisons equally fast). B-trees optimize for disk: minimize disk accesses. Each node accessed once from disk (expensive). More children per node reduces tree height (more comparisons, fewer disk accesses).
B-Tree Properties
Order m: each node at most m children. Root at least 2, internal nodes at least ceil(m/2). All leaves at same depth. Many keys per node (up to m-1 keys, m children).
Search and Insert
At each node, binary search among keys determines child to follow. Insert: add to leaf. If full, split (promote middle key, split children). Guarantees O(log n) disk accesses.
Databases
B-trees standard for database indexes (B+ trees variant stores data in leaves). Minimize disk I/O (limiting factor for large datasets).
Tree Traversal: In-Order, Pre-Order, Post-Order, Level-Order
Pre-Order (Node-Left-Right)
def preorder(node):
if node is None:
return
print(node.value) # Visit
preorder(node.left) # Left
preorder(node.right) # Right
Use: copy tree, create expression tree from prefix notation
In-Order (Left-Node-Right)
def inorder(node):
if node is None:
return
inorder(node.left)
print(node.value)
inorder(node.right)
Use: BST in sorted order, evaluate expression tree
Post-Order (Left-Right-Node)
def postorder(node):
if node is None:
return
postorder(node.left)
postorder(node.right)
print(node.value)
Use: delete tree, evaluate expression tree (postfix)
Level-Order (BFS)
def level_order(root):
queue = Queue()
queue.enqueue(root)
while not queue.is_empty():
node = queue.dequeue()
print(node.value)
if node.left:
queue.enqueue(node.left)
if node.right:
queue.enqueue(node.right)
Use: tree by levels, breadth-first exploration
Iterative Traversal
Using explicit stack (for DFS) or queue (for BFS) avoids recursion, overcomes depth limitations.
Heap Trees for Priority Queues
Heap Property
Min-heap: parent <= children (minimum at root). Max-heap: parent >= children. Complete binary tree.
Array Representation
Stored in array: node at index i has children at 2i+1, 2i+2. Parent at (i-1)//2. No pointer needed, cache-friendly.
Heap Operations
Insert: add to end, bubble up (O(log n))
Delete root: move last to root, bubble down (O(log n))
Heapify: build heap from array O(n)
All operations O(log n) for n elements
Applications
Priority queues (dijkstra's algorithm), heap sort, load balancing (always process highest-priority task).
Applications: Search, Indexes, Expression Parsing
Binary Search Trees
Database indexes, in-memory searching. Efficient if balanced. Sorted access (in-order traversal).
File Systems
Directory tree: hierarchical folders. Depth-first search traverses all files. Symbolic links create DAGs (not trees).
Expression Parsing
Expression tree: internal nodes operators, leaves operands. Pre-order: prefix notation. Post-order: postfix notation. Evaluate recursively.
Trie (Prefix Tree)
Store strings: each node represents character. Enables fast prefix matching (autocomplete). O(m) search for string of length m (vs. O(n log n) hash table).
Segment Trees and Fenwick Trees
Advanced: range queries (sum of range [i, j]). Build tree in O(n), query O(log n), update O(log n).
Challenges: Balancing, Performance
Balancing Overhead
Maintaining balance adds complexity (rotations, rebalancing). Worth it only if frequent searches (searches dominate). For insert-heavy, simpler trees acceptable.
Memory Overhead
Pointers per node (2 for binary). For small data (int: 4 bytes), pointers large overhead. Arrays compete: simpler, cache-friendly, less per-node overhead.
Cache Performance
Tree nodes scattered (each node separate allocation). Array-based trees (heaps) better cache locality. Linked trees suffer pointer chasing.
Balancing vs. Simplicity
Unbalanced BST simpler, faster inserts (no rebalancing). For adversarial input, becomes slow. Balanced trees guarantee performance, cost complexity. Choose based on input patterns.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
- Knuth, D. E. "The Art of Computer Programming, Volume 1: Fundamental Algorithms." Addison-Wesley, 3rd edition, 1997.
- Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
- Skiena, S. S. "The Algorithm Design Manual." Springer, 2nd edition, 2008.
- Comer, D. "The Ubiquitous B-Tree." ACM Computing Surveys, vol. 11, no. 2, 1979.