Introduction
Bitmap index: specialized index structure for low-cardinality columns (few distinct values). Represents data as bit arrays: one bit per row per value. Extremely space-efficient and query-optimized for specific workloads.
Core insight: if column has few values (e.g., gender: M/F, status: active/inactive), bitmap representation far more efficient than B-Trees. Perfect for data warehousing, analytical queries, OLAP systems.
Usage: primarily in data warehouses (Oracle, SQL Server, Vertica). Less common in OLTP systems (cardinality usually higher). Strategic choice: understand column distribution first.
"Bitmap indexes excel on low-cardinality columns, trading specificity for space efficiency and query speed. Dominant in analytical systems where cardinality patterns predictable." -- Index structures
Bitmap Index Concept
Basic Structure
Bitmap index: one bitmap per distinct value. Each bitmap: one bit per row. Bit = 1: row has that value. Bit = 0: row doesn't have value. Example: gender column (M, F) -> 2 bitmaps (one for each).
Simple Example
Table: users (id, gender)
id: 1 2 3 4 5
gender: M F M F M
Bitmap for gender=M: [1, 0, 1, 0, 1]
Bitmap for gender=F: [0, 1, 0, 1, 0]
Query: WHERE gender='M'
Result: positions 1, 3, 5 (where bitmap=1)
Space Representation
Bitmap: array of bits (not bytes). 1000 rows with 5 values: 5 bitmaps * 1000 bits = 5000 bits = 625 bytes. Compare: B-Tree with full index values, much larger.
Query Processing
Search: load relevant bitmap(s), scan for 1-bits. No tree traversal. Direct bit operations. Simple, fast.
Multi-Value Predicates
AND query: bitmap1 AND bitmap2 (bitwise AND operation). OR query: bitmap1 OR bitmap2. NOT: bitmap complement. Complex queries: combine bitmaps with logical operations.
Bitmap Representation
Dense Bitmaps
Full bitmap: one bit per row. Simple, straightforward. Memory: n bits (n rows). Efficient when rows numerous but sparsely distributed (many 0s possible with compression).
Sparse Bitmaps
Only store positions with 1-bits. Example: row list [1, 3, 5] for bitmap [1, 0, 1, 0, 1]. Memory-efficient if few 1-bits. Compression: achieves smaller size.
Inverted Lists
Alternative: store row IDs for each value. Value=M: [1, 3, 5]. Value=F: [2, 4]. Query: intersect/union lists instead of bitmap operations. Trade-off: different performance characteristics.
Row Ordering
Rows sorted by value: consecutive 1-bits in bitmap. Improves compression (runs of 1s/0s). Example: M, M, M, F, F -> bitmap [1,1,1,0,0] (run length 3, then 2).
Variable-Width Bitmaps
Different column values: different bitmap sizes. Store contiguously or separately. Organization impacts cache locality and access patterns.
Low-Cardinality Columns
Definition
Low-cardinality: few distinct values relative to row count. Examples: gender (2-3 values), status (5-10 values), region (50 values). High-cardinality: many distinct values. Examples: user_id (millions), timestamp (unique per row).
Bitmap Efficiency
Bitmaps excel when cardinality c << n (rows). Space: c * n bits regardless of value size. B-Tree: space proportional to (n * value_size). For c=10, n=1M: bitmap more efficient.
Cardinality Threshold
Rule of thumb: bitmap beneficial if cardinality less than 100-1000 values. Beyond: B-Tree, hash index more efficient. Sweet spot: 2-50 distinct values.
Practical Examples
Ideal: gender (2), status (5), region (10), department (20). Not ideal: user_name (millions), product_id (thousands), timestamp (high cardinality).
Mixed Workloads
Analytical queries: benefit from bitmap. Point lookups: B-Tree faster. Solution: multiple indexes. Tradeoff: maintenance cost vs. query benefits.
Bit Operations and Logical Queries
Bitmap Operations
AND: intersection (bitmap1 AND bitmap2). OR: union. NOT: complement. XOR: exclusive or. Operations: CPU-optimized (bit manipulation). Very fast.
Example Query
WHERE gender='M' AND status='active'
Bitmap_M: [1, 0, 1, 0, 1]
Bitmap_active: [1, 1, 0, 0, 1]
Result (AND): [1, 0, 0, 0, 1]
Positions: 1, 5 (rows matching both conditions)
Complex Predicates
DNF (Disjunctive Normal Form): (A AND B) OR (C AND D). Bitmap: compute each AND, then OR. Efficient: few operations. Cost: linear in clause count.
Range Predicates
Bitmap index on sorted column: bitmap for each value or range. Ranges: union bitmaps. Example: status in (active, pending) -> bitmap_active OR bitmap_pending.
CPU Efficiency
Modern CPUs: fast bit operations. SSE, AVX instructions: process multiple bits in parallel. Bitmap scanning: throughput 1GB/sec (typical). Very efficient.
Bitmap Compression
Run-Length Encoding (RLE)
Compress runs of identical bits. Example: [1,1,1,0,0,0,1] -> (3:1, 3:0, 1:1). Efficient when long runs (sorted column). Decompresses on access.
Compression Ratios
Typical: 10-100x compression with RLE. Sparse bitmaps (many 0s): high compression. Dense (many 1s): less compression. Depends on value distribution and sorting.
Dictionary Compression
Store distinct values once (dictionary). Bitmaps reference dictionary entries. Reduces storage of value metadata. Combined with bitmap compression.
Trade-offs
Compression reduces storage but increases query cost (decompression). Choose: depends on I/O vs. CPU cost. Typical: trade small CPU cost for I/O savings (worthwhile).
Practical Compression
Column-family stores (HBase), data warehouses (Vertica, Parquet): heavy compression. OLTP systems: less aggressive. Strategy: OLAP systems prioritize compression.
Space Efficiency
Calculation Example
Scenario: 1M rows, 10 distinct values per column
Bitmap Index: 10 bitmaps * 1M bits / 8 = 1.25 MB
B-Tree Index: 1M entries * 8 bytes (pointer) + overhead = 8+ MB
Saving: ~6x compression with bitmaps
With RLE compression on sorted data:
Bitmap: ~100-200 KB (50-75x compression)
Advantage: massive space savings
Multi-Dimensional Indexes
Combine multiple low-cardinality columns. Star schema: fact table with dimension tables. Bitmap: efficient for dimension filtering. Reduces data scanned.
Memory Footprint
Entire index fits in memory (even with millions rows). Cache-friendly: sequential bitmap access. No random I/O. Predictable performance.
Index Maintenance
Insertion: append bit, update bitmaps. Deletion: expensive (shift bits). Solution: mark as deleted, rebuild periodically. Cost: maintenance complexity.
Scalability
Scales excellently with row count (linear space). Scales poorly with cardinality (O(c * n)). Design: keep cardinality low, many rows.
Performance Characteristics
Query Performance
Simple equality: O(n/w) where w word size (64 bits). Compare: B-Tree O(log n). For large tables: bitmap much faster. Scan entire bitmap faster than tree traversal.
Complex Queries
Multiple conditions: bitmap AND/OR operations. Fast: CPU operations, no I/O. Scale: proportional to number of conditions (not table size).
Insertion Cost
Add row: extend bitmaps, update affected bits. Amortized O(1) per row (bitmap resized occasionally). Reasonable cost.
Deletion Cost
Mark as deleted: cheap (set bit). Reclaim space: rebuild bitmaps (expensive). Amortized: acceptable if deletions rare.
Index Size Impact
| Metric | Bitmap | B-Tree | Winner |
|---|---|---|---|
| Index Space | 0.1-1 MB (1M rows, 10 values) | 8-20 MB | Bitmap |
| Lookup Speed | O(n/w) | O(log n) | Bitmap |
| Range Query | Moderate | O(log n + k) | B-Tree |
| Deletion | Expensive rebuild | O(log n) | B-Tree |
Query Optimization with Bitmaps
Bitmap Filtering
Use bitmap to identify candidate rows. Retrieve only matching rows. Reduces I/O. Example: WHERE status='active' AND region='NA' uses two bitmaps, identifies rows before accessing data.
Early Termination
Bitmap result: set of matching rows. Access only necessary rows. No table scan. Significant speedup for selective queries.
Bitmap Join Index
Index on join condition. Example: fact_id -> dimension bitmap. Pre-computed join information. Avoids join at query time. Specialized optimization.
Materialized Views
Pre-compute query results (aggregate with bitmaps). Store bitmap results. Reuse for similar queries. Cache-like optimization for OLAP.
Cost-Based Optimization
Query optimizer: decide use bitmap vs. table scan. Estimate selectivity: high selectivity favors bitmap, low selectivity may favor scan (depending on implementation).
Comparison with Other Indexes
vs. B-Tree Index
B-Tree: ordered, supports range queries. Bitmap: fast equality, low space. Choose: B-Tree for mixed queries, bitmap for low-cardinality equality.
vs. Hash Index
Hash: O(1) equality but requires complete hash. Bitmap: O(n/w) but works for complex predicates naturally. Bitmap better for analytical queries.
vs. Full Table Scan
Table scan: O(n) with selectivity (return subset). Bitmap: O(n/w) + data access. For low selectivity: bitmap wins. For high selectivity: scan may be faster.
vs. Inverted List
Inverted: store row IDs per value. Bitmap: bits for rows. Inverted better for very sparse (few 1-bits). Bitmap better for dense (many 1-bits).
Selection Matrix
Bitmap: low-cardinality, analytical queries, space-sensitive. B-Tree: mixed workload, range queries, ordered access. Hash: pure equality, in-memory caches. Bitmap: specialized but powerful.
Real-World Applications
Data Warehouses
Star schema: dimensions (low cardinality). Bitmap indexes: fact table dimension columns. Queries: filter by multiple dimensions fast. Standard practice in OLAP.
SQL Server Columnstore
Columnstore index: column-oriented storage. Bitmap optimization: segment elimination (skip segments based on bitmaps). Dramatically reduces I/O.
Oracle Bitmap Indexes
Oracle: native bitmap index support. Creates bitmap for each distinct value. Efficient for gender, status, region columns. Common in data warehouse configurations.
Analytics Databases
Vertica, Presto, Hive: bitmap-like optimizations. Low-cardinality column handling. Query: fast filtering. Cost-based optimizer chooses bitmap strategies.
Search Engines
Inverted indexes with bitmaps: document flags (indexed, cached, ranked). Bitmap operations: fast filtering. Reduce candidates before scoring.
Time-Series Databases
Tags (low-cardinality): bitmap indexes. Queries: filter by tag combinations fast. Example: environment=prod AND region=us-east (bitmap operations).
Limitations and Trade-offs
High-Cardinality Inefficiency
Cardinality > 1000: bitmaps space-inefficient. One bitmap per value: too many. Space overhead outweighs benefits. B-Tree preferred.
Deletion Complexity
Deletion: mark as deleted, bitmaps grow. Space reclamation: requires rebuild. Workaround: bulk updates (batch deletes, rebuild periodically).
Value Distribution Dependency
Compression depends on distribution. Sorted column: great compression. Random order: poor compression. Pre-sort: adds cost.
Limited Query Types
Excellent for: exact match, AND/OR predicates. Weak for: LIKE (pattern), substring, mathematical operations. B-Tree better for complex conditions.
Memory Requirements
Load entire bitmap for query: memory must accommodate. Large number of bitmaps: exceeds cache. Careful planning: memory constraints.
When Not to Use
High-cardinality columns: avoid bitmap. OLTP (many updates): overhead excessive. Range queries dominant: B-Tree better. Unique values per row: prohibitive.
References
- O'Neil, P., and O'Neil, E. E. "Database Principles, Programming, and Performance." Morgan Kaufmann, 2nd edition, 2001.
- Abadi, D., Madden, S. R., and Ferreira, M. C. "Integrating Compression and Execution in Column-Oriented Database Systems." Proceedings of SIGMOD, 2006.
- Chan, C. Y., and Ioannidis, Y. E. "Bitmap Index Design and Evaluation." Proceedings of ACM SIGMOD, 1998.
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.