Introduction

B-tree: self-balancing multi-way tree (Bayer and McCreight, 1972). Property: every node has multiple keys and children (not binary). Disk-optimized: minimizes disk I/O. Use: database indexes, file systems. Advantage: N-way branching reduces tree height. Cost: node complexity.

"B-trees revolutionized database design by recognizing that disk I/O dominates computation. By storing multiple keys per node, they minimize tree height and disk accesses, enabling efficient indexing of massive datasets." -- Database systems

B-Tree Definition

Formal Definition

B-tree of order m: tree where every node has at most m children. Keys: at most m-1 keys per node. Balanced: all leaves at same depth. Sorted: keys ordered (left < parent < right).

Key Distinction from Binary Trees

Binary: 2 children maximum. B-tree: m children maximum (hundreds possible). Keys per node: 1 key (binary), m-1 keys (B-tree). Branching: exponentially reduces height.

Example: B-Tree Order 3

Order m=3: max 2 keys, max 3 children 15 40 / | | \ 5 10|20 25|...Internal node: 2 keys (15, 40) and 3 pointersChildren: < 15, 15-40, > 40

Comparison to Binary Tree

Binary: height for 1 million nodes = ~20 (log2(10^6)). B-tree (order 512): height = ~3 (log512(10^6)). Disk reads: binary ~20, B-tree ~3 (order-of-magnitude difference).

B-Tree Properties

Core Properties

1. Every node has at most m children. 2. Every internal node (except root) has at least ceil(m/2) children. 3. Root has at least 2 children (if not leaf). 4. All leaves at same depth. 5. Data in leaves or distributed (variant-dependent).

Property Implications

Balanced: height always O(log n). Minimum occupancy: ceil(m/2) (prevents sparse trees). Dense packing: efficient space utilization. Order selection: balances node size vs tree depth.

Leaf vs Internal Nodes

Different variants: data only in leaves (B-trees) or distributed (B+ trees). B-tree: data anywhere. B+ tree: data in leaves, internal nodes for indexing only.

Tree Structure and Nodes

Internal Node Structure

Node with k keys:[ptr0, key1, ptr1, key2, ptr2, ..., keyk, ptrk]Example (k=2):[ptr0, 15, ptr1, 40, ptr2]Pointers: ptr0 (< 15), ptr1 (15-40), ptr2 (> 40)Keys: ordered (key_i < key_{i+1})

Node Splitting Threshold

Full node: m children (m-1 keys). Insert into full node: split. Empty node: ceil(m/2) children minimum (except root). Underflow: merge or rebalance.

Depth Analysis

Height h: at least h leaves. Minimum keys per non-root node: ceil(m/2) - 1. For n keys: height <= log(ceil(m/2))(n+1). Low height: few disk accesses.

Example Levels

Order m=512 (typical disk block size)Min children per node: 256Max keys per node: 511Level 0 (root): up to 511 keysLevel 1: up to 511 * 256 = 130K keysLevel 2: up to 130K * 256 = 33M keysLevel 3: up to 33M * 256 = 8B keysTypical database: needs only 3-4 levels (disk accesses)

Insertion Algorithm

Basic Algorithm

1. Search for leaf where key belongs2. Insert key in correct position (if not duplicate)3. If leaf has m-1 keys (full): - Split leaf into two - Move middle key up to parent4. If parent becomes full: - Recursively split parent5. If root splits: create new root

Example Insertion

Insert 17 into:[10 20] | [5 8] [12 15] [25 30]Find leaf for 17: goes to [12 15]Insert 17: [12 15 17]After insertion:[10 20] | [5 8] [12 15 17] [25 30]

Split Operation

Insert 18 into full leaf [12 15 17 18]Leaf is full (4 keys, max 3): SPLITBefore: [12 15 17 18]Middle: 15After: [12] and [17 18]Move 15 up to parentParent [10 20] becomes [10 15 20] with 3 children

Cascading Splits

If parent becomes full: split parent. If grandparent becomes full: split grandparent. Cascades up to root. Creates new root if needed (tree grows by 1 level).

Deletion Algorithm

Deletion Cases

Case 1: Key in leaf - delete directly (if no underflow). Case 2: Key in internal node - replace with predecessor/successor, delete from leaf. Case 3: After deletion, node underflows - rebalance (borrow or merge).

Underflow Handling

Node has fewer than ceil(m/2)-1 keys: underflowSolution 1: Borrow from sibling If sibling has extra key: move through parentSolution 2: Merge with sibling Combine two nodes, pull key from parent

Example Deletion

Delete 15 from [10 15 20]After deletion: [10 20] (2 keys)Check: min keys = ceil(3/2) - 1 = 1Status: [10 20] still valid (has 1 key minimum)

Cascading Merges

If parent underflows after pulling key: merge parent. Cascades up to root. Root can have just 1 key (special case). If root becomes empty: delete root, tree shrinks by 1 level.

Complexity

Find: O(log n). Delete: O(1). Rebalancing: O(log n) splits/merges. Total: O(log n).

Node Splitting

Split Procedure

Split node with m children (full) and m-1 keys:1. Find middle key (at position ceil((m-1)/2))2. Create new node with upper half of keys3. Move middle key up to parent4. Parent insertion may trigger another split

Example: Order 5

Full node (5 children, 4 keys):[10 20 30 40] | | | | |Ptr Ptr Ptr Ptr PtrSplit (middle = 30):Left: [10 20] Right: [40]Move 30 to parentParent gains new child: ... 30 ... / \ / \ [10 20] [40]

Balancing Property

After split: both nodes have roughly m/2 children. Maintains balance property. Min/max constraints respected.

Node Merging

Merge Procedure

Merge two nodes with few keys:1. Check if sibling has spare key (can borrow) If yes: move key through parent (rotation) If no: merge two nodes2. Pull key from parent (becomes separator)3. Combine keys and children from both nodes4. Delete empty node

Example Rotation

Node [10] (underflow) with sibling [30 40]:Parent: [20]Borrow from right sibling:Left: [10 20] (20 moved from parent)Right: [40] (30 moved to parent)Parent: [30] (updated separator)

Example Merge

Node [10] and sibling [30] (both minimal):Parent: [20]Merge:Combined: [10 20 30] (20 from parent pulled down)Parent loses 20: becomes empty/underflows

Cascading Effect

Parent underflow: triggers parent merge. Cascades up. Root underflow: special case (becomes new root if empty).

Order Parameter (m)

Order Selection

Order m determines: node capacity, tree shape, performance. Typical: m = (disk block size / entry size). Example: block 4KB, entry 8 bytes -> m = 512.

Effects of Order

AspectSmall Order (m=3)Large Order (m=512)
Tree heightLarge (many levels)Small (few levels)
Node sizeSmall (2 keys)Large (511 keys)
Comparisons/nodeFew (linear)More (binary search)
Disk I/OHigh (many accesses)Low (few accesses)
Best forMemory-based (RAM)Disk-based (databases)

Optimal Order

Minimize cost = tree height + node search cost. For disk: high order (minimize height, I/O dominates). For memory: lower order (minimize comparisons, CPU matters).

Time and Space Complexity

Time Complexity

OperationComplexityNotes
SearchO(log n)Tree height
InsertO(log n)Find + split
DeleteO(log n)Find + merge
Range queryO(log n + k)k = results

Space Complexity

Storage: O(n) for n keys. Occupancy: 50% minimum (ceil(m/2) children). Effective storage: between 50%-100% full.

Practical Disk Analysis

Disk access dominates. For 1 billion keys with order 512: height ~4 levels = 4 disk reads. Memory reads: negligible compared to disk. B-trees designed specifically to minimize disk I/O.

Real-World Applications

Database Indexes

Primary use: relational databases (MySQL, PostgreSQL). Index: quickly locate rows. B+ tree variant: maintains sorted order, efficient range scans. Billions of records: practical with 3-4 tree levels.

File Systems

NTFS, ext4, HFS+: use B-trees for metadata. Fast directory lookup: few disk accesses. Efficient tree operations: insertion/deletion balanced.

Key-Value Stores

LSM trees (Bigtable, LevelDB): use B-tree-like structures. Sorted storage: efficient range queries. Compaction: reorganizes into B-tree structure.

Search Engines

Inverted indexes: search term to documents. B-tree variant: supports phrase queries. Index size: petabytes managed with B-tree structure.

NoSQL Databases

MongoDB, CouchDB: B-tree indexes for document lookup. Multi-key indexes: compound keys (multiple fields). Efficient: optimized for disk-based storage.

References

  • Bayer, R., and McCreight, E. "Organization and Maintenance of Large Ordered Indexes." Acta Informatica, vol. 1, 1972, pp. 173-189.
  • Knuth, D. E. "The Art of Computer Programming: Sorting and Searching." Vol. 3, Addison-Wesley, 2nd edition, 1998.
  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
  • Comer, D. "The Ubiquitous B-Tree." ACM Computing Surveys, vol. 11, no. 2, 1979, pp. 121-137.
  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.