Introduction

Tree is hierarchical data structure with nodes connected by edges. Root is topmost node, leaves are nodes with no children. Trees model hierarchies: file systems, organizational charts, parse trees, inheritance hierarchies. Fundamental to computer science: enable efficient searching, sorting, organizing information.

Key property: acyclic (no cycles, path from root to any node unique). Contrasts with graphs (which allow cycles). Special case: connected acyclic graph with n nodes has exactly n-1 edges.

Binary trees: each node at most 2 children. Restricts structure, enables efficient algorithms. Binary search trees: left children < parent < right children. Enable O(log n) search if balanced.

"Trees are nature's solution to organization. Hierarchical structure enables efficient navigation and understanding of complex systems." -- Data structure design principles

Tree Terminology and Properties

Key Terms

  • Root: Topmost node (no parent)
  • Parent: Node with child
  • Child: Node pointed to by parent
  • Leaf: Node with no children
  • Depth: Distance from root (root depth = 0)
  • Height: Distance to farthest leaf
  • Subtree: Node and all descendants

Properties

N-node tree has exactly n-1 edges (connectivity). Path from root to any node unique. Height determines depth of recursion (worst case O(height) traversal).

Perfect, Complete, Full Binary Trees

Perfect: All leaves at same depth. Full binary tree with all levels filled. Height h: exactly 2^(h+1) - 1 nodes.

Complete: All levels filled except possibly last (packed left). Height approximately log n.

Full: Every node has 0 or 2 children (no nodes with 1 child).

Binary Trees

Node Structure

class TreeNode:
 def __init__(self, value):
 self.value = value
 self.left = None
 self.right = None

Properties

Maximum n-node height: n (linear tree = linked list). Minimum height: ceil(log n) (balanced tree). Height determines algorithm efficiency.

Counting Nodes

def count_nodes(node):
 if node is None:
 return 0
 return 1 + count_nodes(node.left) + count_nodes(node.right)

Time: O(n), Space: O(height)

Computing Height

def height(node):
 if node is None:
 return -1 # or 0 depending on convention
 return 1 + max(height(node.left), height(node.right))

Time: O(n), Space: O(height)

Binary Search Trees (BSTs)

BST Property

Left subtree all values < node. Right subtree all values > node. Enables binary search: decision at each node determines search direction.

Search Algorithm

def search(node, value):
 if node is None:
 return None
 if value == node.value:
 return node
 elif value < node.value:
 return search(node.left, value)
 else:
 return search(node.right, value)

Time: O(height), O(log n) if balanced, O(n) if linear

Minimum and Maximum

def min_node(node):
 while node.left:
 node = node.left
 return node

def max_node(node):
 while node.right:
 node = node.right
 return node

In-Order Traversal Gives Sorted Sequence

Left-Node-Right traversal outputs values in sorted order. Useful for accessing BST in sorted order.

BST Operations: Search, Insert, Delete

Insert

def insert(node, value):
 if node is None:
 return TreeNode(value)
 if value < node.value:
 node.left = insert(node.left, value)
 elif value > node.value:
 node.right = insert(node.right, value)
 # Skip if equal (no duplicates)
 return node

Time: O(height)

Delete (Complex)

Three cases: (1) Leaf: just remove. (2) One child: replace with child. (3) Two children: replace with successor (smallest in right subtree), then delete successor.

def delete(node, value):
 if node is None:
 return None
 if value < node.value:
 node.left = delete(node.left, value)
 elif value > node.value:
 node.right = delete(node.right, value)
 else: # Found node to delete
 if node.left is None:
 return node.right
 elif node.right is None:
 return node.left
 else: # Two children
 successor = min_node(node.right)
 node.value = successor.value
 node.right = delete(node.right, successor.value)
 return node

Time: O(height)

Unbalanced Problem

Worst case: sorted input creates linear tree. Height = n. Operations O(n) instead of O(log n). Solution: keep tree balanced (AVL, Red-Black).

Balanced Trees: AVL and Red-Black

AVL Trees

Height-balanced: every node's left and right subtrees differ in height by at most 1. Guarantees O(log n) height. Requires rebalancing on insert/delete via rotations.

Balance factor = height(left) - height(right)
Valid: -1, 0, +1
Outside: requires rotation

Rotations

Left rotation, right rotation restructure tree locally, maintain BST property, restore balance. Amortized: O(log n) per operation (rare rotations).

Red-Black Trees

Alternative to AVL. Each node red or black. Rules: (1) Root black. (2) Leaves (None) black. (3) Red node has black children. (4) All paths to leaves same black count. Guarantees O(log n) height, fewer rotations than AVL.

Practical Performance

AVL: more rigidly balanced, faster search. Red-Black: fewer rotations, faster insert/delete. Most systems use Red-Black (C++ std::map uses Red-Black).

B-Trees for Disk Storage

Motivation

Binary trees optimize for memory (all comparisons equally fast). B-trees optimize for disk: minimize disk accesses. Each node accessed once from disk (expensive). More children per node reduces tree height (more comparisons, fewer disk accesses).

B-Tree Properties

Order m: each node at most m children. Root at least 2, internal nodes at least ceil(m/2). All leaves at same depth. Many keys per node (up to m-1 keys, m children).

Search and Insert

At each node, binary search among keys determines child to follow. Insert: add to leaf. If full, split (promote middle key, split children). Guarantees O(log n) disk accesses.

Databases

B-trees standard for database indexes (B+ trees variant stores data in leaves). Minimize disk I/O (limiting factor for large datasets).

Tree Traversal: In-Order, Pre-Order, Post-Order, Level-Order

Pre-Order (Node-Left-Right)

def preorder(node):
 if node is None:
 return
 print(node.value) # Visit
 preorder(node.left) # Left
 preorder(node.right) # Right

Use: copy tree, create expression tree from prefix notation

In-Order (Left-Node-Right)

def inorder(node):
 if node is None:
 return
 inorder(node.left)
 print(node.value)
 inorder(node.right)

Use: BST in sorted order, evaluate expression tree

Post-Order (Left-Right-Node)

def postorder(node):
 if node is None:
 return
 postorder(node.left)
 postorder(node.right)
 print(node.value)

Use: delete tree, evaluate expression tree (postfix)

Level-Order (BFS)

def level_order(root):
 queue = Queue()
 queue.enqueue(root)
 while not queue.is_empty():
 node = queue.dequeue()
 print(node.value)
 if node.left:
 queue.enqueue(node.left)
 if node.right:
 queue.enqueue(node.right)

Use: tree by levels, breadth-first exploration

Iterative Traversal

Using explicit stack (for DFS) or queue (for BFS) avoids recursion, overcomes depth limitations.

Heap Trees for Priority Queues

Heap Property

Min-heap: parent <= children (minimum at root). Max-heap: parent >= children. Complete binary tree.

Array Representation

Stored in array: node at index i has children at 2i+1, 2i+2. Parent at (i-1)//2. No pointer needed, cache-friendly.

Heap Operations

Insert: add to end, bubble up (O(log n))
Delete root: move last to root, bubble down (O(log n))
Heapify: build heap from array O(n)

All operations O(log n) for n elements

Applications

Priority queues (dijkstra's algorithm), heap sort, load balancing (always process highest-priority task).

Applications: Search, Indexes, Expression Parsing

Binary Search Trees

Database indexes, in-memory searching. Efficient if balanced. Sorted access (in-order traversal).

File Systems

Directory tree: hierarchical folders. Depth-first search traverses all files. Symbolic links create DAGs (not trees).

Expression Parsing

Expression tree: internal nodes operators, leaves operands. Pre-order: prefix notation. Post-order: postfix notation. Evaluate recursively.

Trie (Prefix Tree)

Store strings: each node represents character. Enables fast prefix matching (autocomplete). O(m) search for string of length m (vs. O(n log n) hash table).

Segment Trees and Fenwick Trees

Advanced: range queries (sum of range [i, j]). Build tree in O(n), query O(log n), update O(log n).

Challenges: Balancing, Performance

Balancing Overhead

Maintaining balance adds complexity (rotations, rebalancing). Worth it only if frequent searches (searches dominate). For insert-heavy, simpler trees acceptable.

Memory Overhead

Pointers per node (2 for binary). For small data (int: 4 bytes), pointers large overhead. Arrays compete: simpler, cache-friendly, less per-node overhead.

Cache Performance

Tree nodes scattered (each node separate allocation). Array-based trees (heaps) better cache locality. Linked trees suffer pointer chasing.

Balancing vs. Simplicity

Unbalanced BST simpler, faster inserts (no rebalancing). For adversarial input, becomes slow. Balanced trees guarantee performance, cost complexity. Choose based on input patterns.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
  • Knuth, D. E. "The Art of Computer Programming, Volume 1: Fundamental Algorithms." Addison-Wesley, 3rd edition, 1997.
  • Sedgewick, R., and Wayne, K. "Algorithms, Part I." Addison-Wesley, 4th edition, 2011.
  • Skiena, S. S. "The Algorithm Design Manual." Springer, 2nd edition, 2008.
  • Comer, D. "The Ubiquitous B-Tree." ACM Computing Surveys, vol. 11, no. 2, 1979.