Introduction

Bitmap index: specialized index structure for low-cardinality columns (few distinct values). Represents data as bit arrays: one bit per row per value. Extremely space-efficient and query-optimized for specific workloads.

Core insight: if column has few values (e.g., gender: M/F, status: active/inactive), bitmap representation far more efficient than B-Trees. Perfect for data warehousing, analytical queries, OLAP systems.

Usage: primarily in data warehouses (Oracle, SQL Server, Vertica). Less common in OLTP systems (cardinality usually higher). Strategic choice: understand column distribution first.

"Bitmap indexes excel on low-cardinality columns, trading specificity for space efficiency and query speed. Dominant in analytical systems where cardinality patterns predictable." -- Index structures

Bitmap Index Concept

Basic Structure

Bitmap index: one bitmap per distinct value. Each bitmap: one bit per row. Bit = 1: row has that value. Bit = 0: row doesn't have value. Example: gender column (M, F) -> 2 bitmaps (one for each).

Simple Example

Table: users (id, gender)
id: 1 2 3 4 5
gender: M F M F M

Bitmap for gender=M: [1, 0, 1, 0, 1]
Bitmap for gender=F: [0, 1, 0, 1, 0]

Query: WHERE gender='M'
Result: positions 1, 3, 5 (where bitmap=1)

Space Representation

Bitmap: array of bits (not bytes). 1000 rows with 5 values: 5 bitmaps * 1000 bits = 5000 bits = 625 bytes. Compare: B-Tree with full index values, much larger.

Query Processing

Search: load relevant bitmap(s), scan for 1-bits. No tree traversal. Direct bit operations. Simple, fast.

Multi-Value Predicates

AND query: bitmap1 AND bitmap2 (bitwise AND operation). OR query: bitmap1 OR bitmap2. NOT: bitmap complement. Complex queries: combine bitmaps with logical operations.

Bitmap Representation

Dense Bitmaps

Full bitmap: one bit per row. Simple, straightforward. Memory: n bits (n rows). Efficient when rows numerous but sparsely distributed (many 0s possible with compression).

Sparse Bitmaps

Only store positions with 1-bits. Example: row list [1, 3, 5] for bitmap [1, 0, 1, 0, 1]. Memory-efficient if few 1-bits. Compression: achieves smaller size.

Inverted Lists

Alternative: store row IDs for each value. Value=M: [1, 3, 5]. Value=F: [2, 4]. Query: intersect/union lists instead of bitmap operations. Trade-off: different performance characteristics.

Row Ordering

Rows sorted by value: consecutive 1-bits in bitmap. Improves compression (runs of 1s/0s). Example: M, M, M, F, F -> bitmap [1,1,1,0,0] (run length 3, then 2).

Variable-Width Bitmaps

Different column values: different bitmap sizes. Store contiguously or separately. Organization impacts cache locality and access patterns.

Low-Cardinality Columns

Definition

Low-cardinality: few distinct values relative to row count. Examples: gender (2-3 values), status (5-10 values), region (50 values). High-cardinality: many distinct values. Examples: user_id (millions), timestamp (unique per row).

Bitmap Efficiency

Bitmaps excel when cardinality c << n (rows). Space: c * n bits regardless of value size. B-Tree: space proportional to (n * value_size). For c=10, n=1M: bitmap more efficient.

Cardinality Threshold

Rule of thumb: bitmap beneficial if cardinality less than 100-1000 values. Beyond: B-Tree, hash index more efficient. Sweet spot: 2-50 distinct values.

Practical Examples

Ideal: gender (2), status (5), region (10), department (20). Not ideal: user_name (millions), product_id (thousands), timestamp (high cardinality).

Mixed Workloads

Analytical queries: benefit from bitmap. Point lookups: B-Tree faster. Solution: multiple indexes. Tradeoff: maintenance cost vs. query benefits.

Bit Operations and Logical Queries

Bitmap Operations

AND: intersection (bitmap1 AND bitmap2). OR: union. NOT: complement. XOR: exclusive or. Operations: CPU-optimized (bit manipulation). Very fast.

Example Query

WHERE gender='M' AND status='active'
Bitmap_M: [1, 0, 1, 0, 1]
Bitmap_active: [1, 1, 0, 0, 1]
Result (AND): [1, 0, 0, 0, 1]
Positions: 1, 5 (rows matching both conditions)

Complex Predicates

DNF (Disjunctive Normal Form): (A AND B) OR (C AND D). Bitmap: compute each AND, then OR. Efficient: few operations. Cost: linear in clause count.

Range Predicates

Bitmap index on sorted column: bitmap for each value or range. Ranges: union bitmaps. Example: status in (active, pending) -> bitmap_active OR bitmap_pending.

CPU Efficiency

Modern CPUs: fast bit operations. SSE, AVX instructions: process multiple bits in parallel. Bitmap scanning: throughput 1GB/sec (typical). Very efficient.

Bitmap Compression

Run-Length Encoding (RLE)

Compress runs of identical bits. Example: [1,1,1,0,0,0,1] -> (3:1, 3:0, 1:1). Efficient when long runs (sorted column). Decompresses on access.

Compression Ratios

Typical: 10-100x compression with RLE. Sparse bitmaps (many 0s): high compression. Dense (many 1s): less compression. Depends on value distribution and sorting.

Dictionary Compression

Store distinct values once (dictionary). Bitmaps reference dictionary entries. Reduces storage of value metadata. Combined with bitmap compression.

Trade-offs

Compression reduces storage but increases query cost (decompression). Choose: depends on I/O vs. CPU cost. Typical: trade small CPU cost for I/O savings (worthwhile).

Practical Compression

Column-family stores (HBase), data warehouses (Vertica, Parquet): heavy compression. OLTP systems: less aggressive. Strategy: OLAP systems prioritize compression.

Space Efficiency

Calculation Example

Scenario: 1M rows, 10 distinct values per column
Bitmap Index: 10 bitmaps * 1M bits / 8 = 1.25 MB
B-Tree Index: 1M entries * 8 bytes (pointer) + overhead = 8+ MB
Saving: ~6x compression with bitmaps

With RLE compression on sorted data:
Bitmap: ~100-200 KB (50-75x compression)
Advantage: massive space savings

Multi-Dimensional Indexes

Combine multiple low-cardinality columns. Star schema: fact table with dimension tables. Bitmap: efficient for dimension filtering. Reduces data scanned.

Memory Footprint

Entire index fits in memory (even with millions rows). Cache-friendly: sequential bitmap access. No random I/O. Predictable performance.

Index Maintenance

Insertion: append bit, update bitmaps. Deletion: expensive (shift bits). Solution: mark as deleted, rebuild periodically. Cost: maintenance complexity.

Scalability

Scales excellently with row count (linear space). Scales poorly with cardinality (O(c * n)). Design: keep cardinality low, many rows.

Performance Characteristics

Query Performance

Simple equality: O(n/w) where w word size (64 bits). Compare: B-Tree O(log n). For large tables: bitmap much faster. Scan entire bitmap faster than tree traversal.

Complex Queries

Multiple conditions: bitmap AND/OR operations. Fast: CPU operations, no I/O. Scale: proportional to number of conditions (not table size).

Insertion Cost

Add row: extend bitmaps, update affected bits. Amortized O(1) per row (bitmap resized occasionally). Reasonable cost.

Deletion Cost

Mark as deleted: cheap (set bit). Reclaim space: rebuild bitmaps (expensive). Amortized: acceptable if deletions rare.

Index Size Impact

Metric Bitmap B-Tree Winner
Index Space 0.1-1 MB (1M rows, 10 values) 8-20 MB Bitmap
Lookup Speed O(n/w) O(log n) Bitmap
Range Query Moderate O(log n + k) B-Tree
Deletion Expensive rebuild O(log n) B-Tree

Query Optimization with Bitmaps

Bitmap Filtering

Use bitmap to identify candidate rows. Retrieve only matching rows. Reduces I/O. Example: WHERE status='active' AND region='NA' uses two bitmaps, identifies rows before accessing data.

Early Termination

Bitmap result: set of matching rows. Access only necessary rows. No table scan. Significant speedup for selective queries.

Bitmap Join Index

Index on join condition. Example: fact_id -> dimension bitmap. Pre-computed join information. Avoids join at query time. Specialized optimization.

Materialized Views

Pre-compute query results (aggregate with bitmaps). Store bitmap results. Reuse for similar queries. Cache-like optimization for OLAP.

Cost-Based Optimization

Query optimizer: decide use bitmap vs. table scan. Estimate selectivity: high selectivity favors bitmap, low selectivity may favor scan (depending on implementation).

Comparison with Other Indexes

vs. B-Tree Index

B-Tree: ordered, supports range queries. Bitmap: fast equality, low space. Choose: B-Tree for mixed queries, bitmap for low-cardinality equality.

vs. Hash Index

Hash: O(1) equality but requires complete hash. Bitmap: O(n/w) but works for complex predicates naturally. Bitmap better for analytical queries.

vs. Full Table Scan

Table scan: O(n) with selectivity (return subset). Bitmap: O(n/w) + data access. For low selectivity: bitmap wins. For high selectivity: scan may be faster.

vs. Inverted List

Inverted: store row IDs per value. Bitmap: bits for rows. Inverted better for very sparse (few 1-bits). Bitmap better for dense (many 1-bits).

Selection Matrix

Bitmap: low-cardinality, analytical queries, space-sensitive. B-Tree: mixed workload, range queries, ordered access. Hash: pure equality, in-memory caches. Bitmap: specialized but powerful.

Real-World Applications

Data Warehouses

Star schema: dimensions (low cardinality). Bitmap indexes: fact table dimension columns. Queries: filter by multiple dimensions fast. Standard practice in OLAP.

SQL Server Columnstore

Columnstore index: column-oriented storage. Bitmap optimization: segment elimination (skip segments based on bitmaps). Dramatically reduces I/O.

Oracle Bitmap Indexes

Oracle: native bitmap index support. Creates bitmap for each distinct value. Efficient for gender, status, region columns. Common in data warehouse configurations.

Analytics Databases

Vertica, Presto, Hive: bitmap-like optimizations. Low-cardinality column handling. Query: fast filtering. Cost-based optimizer chooses bitmap strategies.

Search Engines

Inverted indexes with bitmaps: document flags (indexed, cached, ranked). Bitmap operations: fast filtering. Reduce candidates before scoring.

Time-Series Databases

Tags (low-cardinality): bitmap indexes. Queries: filter by tag combinations fast. Example: environment=prod AND region=us-east (bitmap operations).

Limitations and Trade-offs

High-Cardinality Inefficiency

Cardinality > 1000: bitmaps space-inefficient. One bitmap per value: too many. Space overhead outweighs benefits. B-Tree preferred.

Deletion Complexity

Deletion: mark as deleted, bitmaps grow. Space reclamation: requires rebuild. Workaround: bulk updates (batch deletes, rebuild periodically).

Value Distribution Dependency

Compression depends on distribution. Sorted column: great compression. Random order: poor compression. Pre-sort: adds cost.

Limited Query Types

Excellent for: exact match, AND/OR predicates. Weak for: LIKE (pattern), substring, mathematical operations. B-Tree better for complex conditions.

Memory Requirements

Load entire bitmap for query: memory must accommodate. Large number of bitmaps: exceeds cache. Careful planning: memory constraints.

When Not to Use

High-cardinality columns: avoid bitmap. OLTP (many updates): overhead excessive. Range queries dominant: B-Tree better. Unique values per row: prohibitive.

References

  • O'Neil, P., and O'Neil, E. E. "Database Principles, Programming, and Performance." Morgan Kaufmann, 2nd edition, 2001.
  • Abadi, D., Madden, S. R., and Ferreira, M. C. "Integrating Compression and Execution in Column-Oriented Database Systems." Proceedings of SIGMOD, 2006.
  • Chan, C. Y., and Ioannidis, Y. E. "Bitmap Index Design and Evaluation." Proceedings of ACM SIGMOD, 1998.
  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.