Introduction
B-Tree: balanced tree structure optimized for disk-based storage. Keys sorted, enabling efficient searching, range queries, sequential access. Standard index structure: relational databases (PostgreSQL, MySQL, SQLite), file systems (NTFS, ext4).
Core advantage: logarithmic search (O(log n)), guaranteed balance, efficient disk I/O (groups keys in blocks). Solves problems of unbalanced trees (linked lists in worst case).
Design principle: minimize disk I/O operations. Single node access: read multiple keys (block size). Reduces tree height, speeds searches dramatically.
"B-Trees revolutionized database indexing by optimizing for disk I/O patterns. Balanced guarantee ensures predictable performance, making them foundational for modern databases." -- Index structures
B-Tree Properties
Definition
B-Tree of order m: each node has at most m children. Properties: (1) all leaves same depth, (2) all internal nodes (except root) have >= ⌈m/2⌉ children, (3) root has >= 2 children (if not leaf).
Balanced Guarantee
All leaves same depth: no unbalanced branches. Height O(log n) guaranteed. Search, insert, delete all O(log n). Predictable performance.
Key Distribution
Keys sorted within nodes and globally. Node with k children has k-1 keys. Enables binary search within node, guides traversal.
Fill Factor
Nodes at least half full (except root). Ensures efficient space usage. Prevents mostly-empty nodes wasting resources.
Example: B-Tree of Order 3
Max children per node: 3
Max keys: 2
Root: [5, 15]
/ | \
/ | \
[3] [8] [18, 20]
Search 8: root[5,15] -> go middle child [8] found
B-Tree Structure and Organization
Node Structure
Each node contains: keys (sorted), pointers to children. Leaf nodes: no children, contain actual values. Internal nodes: guide traversal.
Key Organization
n keys divide range into n+1 subranges. First key threshold between ranges 1 and 2. Key k separates: all left descendants < k, all right descendants > k.
Node Capacity
Order m determines capacity. For m=3: max 2 keys, 3 children. For m=100: max 99 keys, 100 children. Large order: fewer levels, faster searches.
Disk Block Alignment
Choose order matching disk block size. Typical: 4KB block -> 100-400 keys per node (depending on key size). Single disk read: entire node.
Tree Height
Height h: O(log_m n) where m order, n entries. For m=100, n=1,000,000: height ~3-4. Few disk accesses: excellent performance.
Search Operation
Algorithm
Start root: binary search within node find key or range. Follow child pointer. Repeat until leaf. O(log n) comparisons, log_m n disk accesses.
Example Search (k=8)
Root [5, 15]: 8 between 5 and 15, go middle child
Node [8]: found at index 0
Return value
Range Queries
Search lower bound (left), traverse sequentially to upper bound. Efficient: B-Tree preserves order, can scan range without jumping.
Complexity
Comparisons: O(m log_m n) (m keys per node, log_m n levels). Disk accesses: O(log_m n). Both acceptable: m moderate (100-500), n large (millions).
Implementation Optimization
Caching: keep root nodes in memory (popular paths). Bloom filters: fast negative queries (key doesn't exist). Optimization: minimize I/O impact.
Insertion Algorithm
Overview
Search position, insert key. If node overflows (exceeds order): split node, promote middle key to parent. Cascades upward if parent also overflows.
Example Insertion
B-Tree order 3 (max 2 keys):
Root [5, 15]
Insert 10: [5, 10, 15] (3 keys, overflow!)
Split: promote 10 to root
Result:
[10]
/ \
[5] [15]
Node Splitting
Full node with n keys: split into two nodes. Middle key promoted to parent. Left node: keys 1 to ⌊n/2⌋. Right node: keys ⌊n/2⌋+1 to n.
Cascading Splits
Parent may overflow: split recursively. Eventually reach root: root overflows, create new root. Tree grows in height (rare).
Complexity
Search position: O(log_m n). Insert + splits: O(log_m n). Rare: multiple splits. Amortized: O(log_m n) disk accesses.
Pre-Splitting Optimization
Preventive: split full nodes during descent. Avoid: cascade on insertion. One pass: simpler, faster.
Deletion Algorithm
Case 1: Key in Leaf
Remove directly. If node underfull (< ⌈m/2⌉ keys): borrow from sibling or merge. Maintain invariant.
Case 2: Key in Internal Node
Replace with predecessor (largest in left subtree) or successor (smallest in right). Recursively delete predecessor/successor.
Rebalancing
Node too small: borrow from sibling (if possible) or merge. Merging: two half-full nodes become one full node. Cascades upward if parent underfull.
Example Deletion
B-Tree: [10] with children [5], [15]
Delete 5: [5] becomes empty (underfull)
Merge [5] and [15]: [5, 15]
New root: [10, 15]
Complexity
Search key: O(log_m n). Rebalancing: O(log_m n). Disk accesses: O(log_m n). Similar to insertion.
Lazy Deletion
Mark deleted (don't remove): avoids rebalancing. Periodic cleanup: full reorganization. Trade-off: space for speed.
Disk Optimization
Node Size Selection
Match disk block (4KB typical). Includes keys, pointers, metadata. Optimize: maximize keys per node (larger order), minimize waste.
Clustering
Index entries near keys they point to. Sequential scan: read index blocks, data blocks nearly in order. Reduces seeks.
B-Tree Density
Minimize space waste: nodes 50-100% full (depending on split strategy). Preemptive splitting: fuller nodes initially, more reorganization cost.
Caching Strategy
Root, popular nodes: keep in memory. LRU cache: evict least-used nodes. Fast: frequently accessed paths stay cached.
Sequential Access
Traverse linked leaves (B-Tree variant). Scan range: leaves linked, no parent traversal. Cache-friendly: sequential block reads.
Complexity Analysis
Search
Comparisons: O(m log_m n). Disk I/O: O(log_m n). m typically 100-500. For 1M records: log_m n = 3-4 disk accesses.
Insert/Delete
Worst case: all levels split/merge. Amortized: O(log_m n). Practice: rebalancing rare (2-3 nodes per operation).
Range Query
Find start: O(log_m n). Scan k records: O(log_m n + k/m) disk accesses. Efficient: sequential leaf traversal.
Space
O(n) storage: all records indexed. No extra space (vs. hashing). Optimal: near 100% space utilization possible.
Comparison Table
| Operation | Complexity | Disk I/O |
|---|---|---|
| Search | O(m log_m n) | O(log_m n) |
| Insert | O(m log_m n) | O(log_m n) |
| Delete | O(m log_m n) | O(log_m n) |
| Range Query | O(log_m n + k) | O(log_m n + k/m) |
B-Tree Variants (B+, B*)
B+ Trees
All data in leaves. Internal nodes: keys only (guides search). Advantages: leaves linked (sequential scan efficient), cleaner separation.
Structure Difference
B-Tree: data in internal and leaf nodes. B+ Tree: data only in leaves. Internal nodes larger (no values), fewer levels, faster search.
B* Trees
Stricter balance: 2/3 full minimum. Fewer splits/merges. More complex, marginal benefits. Rarely used.
When to Use Each
B+ Tree: databases (MySQL, PostgreSQL). Better sequential access, standard. B-Tree: filesystems, simpler implementation. B*: specialized.
Practical Consideration
B+ almost always preferred: better I/O patterns, sequential scan efficiency. Standard choice for modern databases.
Practical Database Usage
PostgreSQL
BTREE index type (default). Supports: equality, range queries, sort. Efficient: handles millions of records.
MySQL/InnoDB
B+ trees: primary and secondary indexes. Clustered index (B+ containing data). Secondary indexes point to primary key.
SQLite
B-tree: page-oriented (4KB default). Efficient: fits database model. Simple, reliable, widely used.
File Systems
NTFS, ext4, HFS+: B-tree variants for directory/inode lookups. Efficient filesystem operations depend on B-tree performance.
Tuning
Create indexes: high-cardinality columns, frequent filters. Monitor: index usage, query plans. Maintenance: periodic rebuild, reindex.
Comparison with Other Indexes
B-Tree vs. Hash
B-Tree: range queries, sorted. Hash: fast equality, no range. Choice: query pattern determines.
B-Tree vs. Bitmap
B-Tree: general purpose, any data type. Bitmap: low cardinality (few values), fast for specific queries.
B-Tree vs. LSM Tree
B-Tree: balanced, predictable. LSM: write-optimized (accumulate, batch write). Choice: read vs. write heavy.
Performance Summary
B-Tree: versatile, good all operations. LSM: excellent writes, good reads. Hash: fastest equality. Choice: workload.
Applications and Use Cases
Database Indexing
Primary use: fast lookups, range queries. Essential: production databases rely heavily. Tuning critical.
File Systems
Directory structures: efficient lookups. Inode management: B-tree provides fast access.
Key-Value Stores
Redis, LevelDB: B-tree variants for storage. Fast, ordered keys. Efficient: practical systems depend on structure.
Search Engines
Inverted indexes: B-trees for fast term lookup. Efficiency: enables large-scale search.
References
- Bayer, R., and McCreight, E. "Organization and Maintenance of Large Ordered Indexes." Acta Informatica, vol. 1, no. 3, 1972, pp. 173-189.
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.