Introduction
Column-family stores: specialized NoSQL databases organizing data by column families (groups of related columns). Optimized: analytical workloads, time-series data, sparse data. Examples: HBase, Cassandra, Apache Kudu, InfluxDB. Advantage: efficient compression, fast analytical queries. Cost: slower for row-based access.
"Column-family stores turn databases inside-out: instead of storing rows, they store columns. This radical shift enables massive compression, efficient analytics, and optimal time-series performance at the cost of slower single-row retrieval." -- NoSQL architecture
Column-Family Definition
Concept
Column family: logical grouping of related columns. Stored together (same file, block). Row key: identifies rows (sparse, thousands of columns possible). Multiple column families per table. Wide rows: millions of columns possible.
Example Structure
Table: metricsRow Key: server_1_2024-03-30Column Family: cpu cpu.usage: 45.2 cpu.temp: 72.1 cpu.cores: 8Column Family: memory memory.free: 8192 memory.total: 16384Column Family: disk disk.used: 256000 disk.free: 1000000 Compared to RDBMS
RDBMS: columns fixed schema, few columns, many rows. Column-family: flexible columns, many columns, few rows. Model: opposite orientation.
Wide Rows
Thousands/millions of columns per row: supported. RDBMS: impractical (denormalization needed). Column-family: natural (column-oriented storage).
Row-Oriented vs Column-Oriented
Row-Oriented Storage
Table: employeesRow 1: | emp_id: 1 | name: Alice | salary: 50000 | dept: 10 |Row 2: | emp_id: 2 | name: Bob | salary: 60000 | dept: 20 |Storage: Row 1 stored together, Row 2 separatelyQuery all columns for row 1: fast (sequential read)Query salary column only: slow (read entire rows) Column-Oriented Storage
Table: employees (column-family)Column emp_id: [1, 2, 3, ...]Column name: [Alice, Bob, Carol, ...]Column salary: [50000, 60000, 55000, ...]Column dept: [10, 20, 10, ...]Storage: All salaries together, all names togetherQuery all columns for row 1: slower (multiple reads)Query salary column: very fast (single read) Trade-offs
| Aspect | Row-Oriented | Column-Oriented |
|---|---|---|
| Full row read | Fast (sequential) | Slower (multiple reads) |
| Column subset | Slower (read unnecessary) | Fast (sequential) |
| Compression | Poor (diverse types) | Excellent (homogeneous) |
| Write efficiency | Good (single write) | Poor (multiple writes) |
Data Structure and Organization
Hierarchy
Table: top-level container. Row key: unique identifier (sorted). Column family: logical grouping (stored separately). Column: attribute within family. Timestamp: versioning (every write timestamped).
Sparse Data Support
Columns optional: rows can have different columns. No NULL storage: missing columns don't consume space. Billions of columns: theoretically possible (practically thousands).
Row Key Design
Key format: {entity}_{timestamp}Example: server_1_2024-03-30_12:00:00Good design: enables range queries, even distributionBad design: hot spots (sequential keys)Salting: add prefix to distributeExample: {hash(entity)}_{entity}_{timestamp}Effect: distributes keys across nodes Versioning
Every write: timestamped. Keep multiple versions (configurable). Read: specify timestamp or latest. Time-travel: queries at past timestamps possible.
HBase Architecture
Components
Master: manages region assignment, cluster health. Region servers: store actual data. HDFS: underlying file system. Zookeeper: coordination, failover.
Regions
Table divided: by row key ranges. Each region: contiguous rows. Region servers: hold multiple regions. Distribution: automatic balancing.
Column Families
Defined: at table creation. Fixed: cannot add dynamically (requires redesign). Store type: configured per family (compression, bloom filters). Best practice: 2-3 families per table.
Write Process
1. Write request: region server receives2. Memstore: buffered in memory3. WAL: written to write-ahead log (HDFS)4. Memstore flush: when full, written to HFile5. Compaction: merges HFiles, optimizes Read Process
1. Read request: specify row key2. Bloom filter: quick negative check3. Memstore check: recent writes4. HFile scan: persistent data (multiple files)5. Merge: combine results, return latest Cassandra Design
Distributed Architecture
No master: peer-to-peer. All nodes equal. Replication: across clusters. Consistency: tunable (per request). Availability: always (AP in CAP).
Column Families (Super Columns)
Keyspace: database-levelColumn Family: table-likeSuper Column: column grouping (optional)Column: attributeExample:Keyspace: social_network Column Family: users Row: user_1 Column: name = "Alice" Column: email = "alice@example.com" Distributed Writes
Write: sent to one node (coordinator). Replicated: to replication factor nodes (eventually). Tunable: W (write replicas required before ack). W+R > N: strong consistency.
Range Queries
Ordered: columns are sorted by name. Range scan: SELECT columns WHERE name > 'x' AND name < 'y'. Efficient: leverages sorting.
Consistency Tuning
Replication factor (RF): copies of data = 3Write quorum (W): acknowledged from 2 replicasRead quorum (R): consistency from 2 replicasScenarios:W=1, R=1: fast, eventually consistentW=RF, R=RF: slow, strongly consistentW=2, R=2 (RF=3): balanced Compression and Storage
Columnar Compression Benefits
Same type: values similar, compress well. Delta encoding: store differences. Dictionary: repeated values. RLE: run-length encoding. Typical: 10x compression.
Example Compression
Row-oriented (poor compression):Alice|50000|Denver|2024-01-15Bob|60000|NYC|2024-01-15Carol|55000|SF|2024-01-15Column-oriented (excellent compression):Names: Alice|Bob|Carol (string compression)Salaries: 50000|60000|55000 (delta encoding)Cities: Denver|NYC|SF (dictionary)Dates: 2024-01-15|2024-01-15|2024-01-15 (RLE) Storage Tiers
Memory: hot data, fast access. SSD: warm data, good balance. HDD: cold data, slow but cheap. Tiering: automatically move based on age.
Block-Level Compression
Blocks: 64KB-1MB. Compressed: in bulk. Decompression: on read. Tradeoff: CPU (compression) vs I/O (size).
Time-Series Optimization
Time-Series Model
Row Key: {metric}_{device}_{date}Example: temperature_sensor_1_2024-03-30Columns: {timestamp}_{field}Example: 12_00_00_celsius, 12_00_01_celsius, ...Design: wide rows, many timestamp columnsNatural fit: column-family excels at this Compression Benefits
Timestamps ordered: RLE effective. Values similar: delta encoding. Storage: efficient. Typical: 100x compression for time-series.
Query Patterns
Range: time window queries. Rollup: aggregate over time. Downsampling: lower resolution for history. All efficient: column-oriented design.
Example: Temperature Monitoring
Table: metricsRow: temperature_sensor_1_2024-03-30Columns: 12_00_00: 20.5 12_00_01: 20.6 12_00_02: 20.7 ... (thousands per day)Query: SELECT 12_00_00 to 12_30_00 (30 min range)Result: 1800 data points efficiently Querying Column-Family Stores
Key-Based Access
GET user_1 -- retrieve entire rowGET user_1.profile.name -- get specific columnSCAN {user_1 to user_100} -- range scan Limitations vs RDBMS
No JOINs: must denormalize. No aggregations: handle in client. No GROUP BY: mapreduce-style processing. Complex queries: difficult/slow.
Query Languages
HBase: shell, Java API, Hive (SQL-like). Cassandra: CQL (Cassandra Query Language, SQL-like). InfluxDB: InfluxQL or Flux. Abstraction: SQL-ish interfaces available.
Secondary Indexes
HBase: manual denormalization (maintain index rows). Cassandra: built-in secondary indexes. Kudu: native secondary indexes. Trade-off: index maintenance cost.
Design Patterns
Wide Row Pattern
One row per entity: many columns. Example: user profile (thousands of properties). Efficient: single row fetch. Denormalized: all data together.
Time Series Pattern
Row: entity + date. Columns: time + metric. Natural fit: analytics, monitoring. Compression: excellent. Scale: easily handles billions of data points.
Inverted Index Pattern
Forward: user_id -> propertiesInverted: property -> user_idsExample:Forward row: user_100 -> [name, email, location]Inverted row: name="Alice" -> [user_100, user_200]Enables: queries by property Dimension Table Pattern
Slowly changing: reference data (cities, products). Cached: client-side. Updated: rarely. Denormalized: into fact rows.
Performance Characteristics
Read Patterns
Single row: very fast (range scan). Column subset: fast (columnar). Full scan: moderate (all data must be read). Filtering: efficient (server-side).
Write Patterns
Sequential writes: optimized (append-only). Random writes: acceptable (buffered). Bulk writes: excellent. Updates: expensive (read-modify-write).
Scalability
Horizontal: adds nodes, distributes data. Petabyte-scale: proven. Millions QPS: achievable. Cost: operational complexity increases.
Comparison Table
| Operation | Performance | Notes |
|---|---|---|
| Single row read | Excellent | Direct access by key |
| Column range scan | Very Good | Leverages column ordering |
| Sequential write | Excellent | Append-only log |
| Random update | Good | Read-modify-write |
| Complex query | Poor | No joins/aggregates |
Use Cases and Applications
Time-Series Databases
IoT sensors: millions of data points. Monitoring: server metrics, application performance. Stock prices: high-frequency trading. Natural fit: column families excel.
Analytics and Data Warehousing
Column-oriented: optimal for analytics (sum salary by dept). Compression: stores massive datasets efficiently. Scale: petabyte datasets. Examples: Druid, ClickHouse.
Wide-Column Applications
User profiles: many attributes per user. Document stores: nested data in columns. Event logs: tracking attributes. Flexible: easy to add columns.
Real-Time Analytics
Live dashboards: sub-second queries. Stream aggregation: time-windowed. Cardinality: billions of unique values. Compression: enables in-memory datasets.
Not Suitable For
OLTP: transactional workloads (slow updates). Complex queries: many joins (not supported). Frequent schema changes: fixed families. Small datasets: overhead exceeds benefit.
References
- George, L. "HBase: The Definitive Guide: Random Access to Large-Scale Data." O'Reilly Media, 2011.
- Hewitt, E. "Cassandra: The Definitive Guide: Distributed Data at Web Scale." O'Reilly Media, 2nd edition, 2016.
- Abadi, D., et al. "Column Stores vs Row Stores: How Different Are They Really?" Proceedings of SIGMOD, 2012.
- Kleppmann, M. "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems." O'Reilly Media, 2017.
- DeCandia, G., et al. "Dynamo: Amazon's Highly Available Key-Value Store." Proceedings of SOSP, 2007.