Introduction

B-Tree: balanced tree structure optimized for disk-based storage. Keys sorted, enabling efficient searching, range queries, sequential access. Standard index structure: relational databases (PostgreSQL, MySQL, SQLite), file systems (NTFS, ext4).

Core advantage: logarithmic search (O(log n)), guaranteed balance, efficient disk I/O (groups keys in blocks). Solves problems of unbalanced trees (linked lists in worst case).

Design principle: minimize disk I/O operations. Single node access: read multiple keys (block size). Reduces tree height, speeds searches dramatically.

"B-Trees revolutionized database indexing by optimizing for disk I/O patterns. Balanced guarantee ensures predictable performance, making them foundational for modern databases." -- Index structures

B-Tree Properties

Definition

B-Tree of order m: each node has at most m children. Properties: (1) all leaves same depth, (2) all internal nodes (except root) have >= ⌈m/2⌉ children, (3) root has >= 2 children (if not leaf).

Balanced Guarantee

All leaves same depth: no unbalanced branches. Height O(log n) guaranteed. Search, insert, delete all O(log n). Predictable performance.

Key Distribution

Keys sorted within nodes and globally. Node with k children has k-1 keys. Enables binary search within node, guides traversal.

Fill Factor

Nodes at least half full (except root). Ensures efficient space usage. Prevents mostly-empty nodes wasting resources.

Example: B-Tree of Order 3

Max children per node: 3
Max keys: 2

Root: [5, 15]
 / | \
 / | \
 [3] [8] [18, 20]

Search 8: root[5,15] -> go middle child [8] found

B-Tree Structure and Organization

Node Structure

Each node contains: keys (sorted), pointers to children. Leaf nodes: no children, contain actual values. Internal nodes: guide traversal.

Key Organization

n keys divide range into n+1 subranges. First key threshold between ranges 1 and 2. Key k separates: all left descendants < k, all right descendants > k.

Node Capacity

Order m determines capacity. For m=3: max 2 keys, 3 children. For m=100: max 99 keys, 100 children. Large order: fewer levels, faster searches.

Disk Block Alignment

Choose order matching disk block size. Typical: 4KB block -> 100-400 keys per node (depending on key size). Single disk read: entire node.

Tree Height

Height h: O(log_m n) where m order, n entries. For m=100, n=1,000,000: height ~3-4. Few disk accesses: excellent performance.

Search Operation

Algorithm

Start root: binary search within node find key or range. Follow child pointer. Repeat until leaf. O(log n) comparisons, log_m n disk accesses.

Example Search (k=8)

Root [5, 15]: 8 between 5 and 15, go middle child
Node [8]: found at index 0
Return value

Range Queries

Search lower bound (left), traverse sequentially to upper bound. Efficient: B-Tree preserves order, can scan range without jumping.

Complexity

Comparisons: O(m log_m n) (m keys per node, log_m n levels). Disk accesses: O(log_m n). Both acceptable: m moderate (100-500), n large (millions).

Implementation Optimization

Caching: keep root nodes in memory (popular paths). Bloom filters: fast negative queries (key doesn't exist). Optimization: minimize I/O impact.

Insertion Algorithm

Overview

Search position, insert key. If node overflows (exceeds order): split node, promote middle key to parent. Cascades upward if parent also overflows.

Example Insertion

B-Tree order 3 (max 2 keys):
Root [5, 15]
Insert 10: [5, 10, 15] (3 keys, overflow!)
Split: promote 10 to root
Result:
 [10]
 / \
 [5] [15]

Node Splitting

Full node with n keys: split into two nodes. Middle key promoted to parent. Left node: keys 1 to ⌊n/2⌋. Right node: keys ⌊n/2⌋+1 to n.

Cascading Splits

Parent may overflow: split recursively. Eventually reach root: root overflows, create new root. Tree grows in height (rare).

Complexity

Search position: O(log_m n). Insert + splits: O(log_m n). Rare: multiple splits. Amortized: O(log_m n) disk accesses.

Pre-Splitting Optimization

Preventive: split full nodes during descent. Avoid: cascade on insertion. One pass: simpler, faster.

Deletion Algorithm

Case 1: Key in Leaf

Remove directly. If node underfull (< ⌈m/2⌉ keys): borrow from sibling or merge. Maintain invariant.

Case 2: Key in Internal Node

Replace with predecessor (largest in left subtree) or successor (smallest in right). Recursively delete predecessor/successor.

Rebalancing

Node too small: borrow from sibling (if possible) or merge. Merging: two half-full nodes become one full node. Cascades upward if parent underfull.

Example Deletion

B-Tree: [10] with children [5], [15]
Delete 5: [5] becomes empty (underfull)
Merge [5] and [15]: [5, 15]
New root: [10, 15]

Complexity

Search key: O(log_m n). Rebalancing: O(log_m n). Disk accesses: O(log_m n). Similar to insertion.

Lazy Deletion

Mark deleted (don't remove): avoids rebalancing. Periodic cleanup: full reorganization. Trade-off: space for speed.

Disk Optimization

Node Size Selection

Match disk block (4KB typical). Includes keys, pointers, metadata. Optimize: maximize keys per node (larger order), minimize waste.

Clustering

Index entries near keys they point to. Sequential scan: read index blocks, data blocks nearly in order. Reduces seeks.

B-Tree Density

Minimize space waste: nodes 50-100% full (depending on split strategy). Preemptive splitting: fuller nodes initially, more reorganization cost.

Caching Strategy

Root, popular nodes: keep in memory. LRU cache: evict least-used nodes. Fast: frequently accessed paths stay cached.

Sequential Access

Traverse linked leaves (B-Tree variant). Scan range: leaves linked, no parent traversal. Cache-friendly: sequential block reads.

Complexity Analysis

Search

Comparisons: O(m log_m n). Disk I/O: O(log_m n). m typically 100-500. For 1M records: log_m n = 3-4 disk accesses.

Insert/Delete

Worst case: all levels split/merge. Amortized: O(log_m n). Practice: rebalancing rare (2-3 nodes per operation).

Range Query

Find start: O(log_m n). Scan k records: O(log_m n + k/m) disk accesses. Efficient: sequential leaf traversal.

Space

O(n) storage: all records indexed. No extra space (vs. hashing). Optimal: near 100% space utilization possible.

Comparison Table

Operation Complexity Disk I/O
Search O(m log_m n) O(log_m n)
Insert O(m log_m n) O(log_m n)
Delete O(m log_m n) O(log_m n)
Range Query O(log_m n + k) O(log_m n + k/m)

B-Tree Variants (B+, B*)

B+ Trees

All data in leaves. Internal nodes: keys only (guides search). Advantages: leaves linked (sequential scan efficient), cleaner separation.

Structure Difference

B-Tree: data in internal and leaf nodes. B+ Tree: data only in leaves. Internal nodes larger (no values), fewer levels, faster search.

B* Trees

Stricter balance: 2/3 full minimum. Fewer splits/merges. More complex, marginal benefits. Rarely used.

When to Use Each

B+ Tree: databases (MySQL, PostgreSQL). Better sequential access, standard. B-Tree: filesystems, simpler implementation. B*: specialized.

Practical Consideration

B+ almost always preferred: better I/O patterns, sequential scan efficiency. Standard choice for modern databases.

Practical Database Usage

PostgreSQL

BTREE index type (default). Supports: equality, range queries, sort. Efficient: handles millions of records.

MySQL/InnoDB

B+ trees: primary and secondary indexes. Clustered index (B+ containing data). Secondary indexes point to primary key.

SQLite

B-tree: page-oriented (4KB default). Efficient: fits database model. Simple, reliable, widely used.

File Systems

NTFS, ext4, HFS+: B-tree variants for directory/inode lookups. Efficient filesystem operations depend on B-tree performance.

Tuning

Create indexes: high-cardinality columns, frequent filters. Monitor: index usage, query plans. Maintenance: periodic rebuild, reindex.

Comparison with Other Indexes

B-Tree vs. Hash

B-Tree: range queries, sorted. Hash: fast equality, no range. Choice: query pattern determines.

B-Tree vs. Bitmap

B-Tree: general purpose, any data type. Bitmap: low cardinality (few values), fast for specific queries.

B-Tree vs. LSM Tree

B-Tree: balanced, predictable. LSM: write-optimized (accumulate, batch write). Choice: read vs. write heavy.

Performance Summary

B-Tree: versatile, good all operations. LSM: excellent writes, good reads. Hash: fastest equality. Choice: workload.

Applications and Use Cases

Database Indexing

Primary use: fast lookups, range queries. Essential: production databases rely heavily. Tuning critical.

File Systems

Directory structures: efficient lookups. Inode management: B-tree provides fast access.

Key-Value Stores

Redis, LevelDB: B-tree variants for storage. Fast, ordered keys. Efficient: practical systems depend on structure.

Search Engines

Inverted indexes: B-trees for fast term lookup. Efficiency: enables large-scale search.

References

  • Bayer, R., and McCreight, E. "Organization and Maintenance of Large Ordered Indexes." Acta Informatica, vol. 1, no. 3, 1972, pp. 173-189.
  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. "Introduction to Algorithms." MIT Press, 3rd edition, 2009.
  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.