Introduction

Column-family stores: specialized NoSQL databases organizing data by column families (groups of related columns). Optimized: analytical workloads, time-series data, sparse data. Examples: HBase, Cassandra, Apache Kudu, InfluxDB. Advantage: efficient compression, fast analytical queries. Cost: slower for row-based access.

"Column-family stores turn databases inside-out: instead of storing rows, they store columns. This radical shift enables massive compression, efficient analytics, and optimal time-series performance at the cost of slower single-row retrieval." -- NoSQL architecture

Column-Family Definition

Concept

Column family: logical grouping of related columns. Stored together (same file, block). Row key: identifies rows (sparse, thousands of columns possible). Multiple column families per table. Wide rows: millions of columns possible.

Example Structure

Table: metricsRow Key: server_1_2024-03-30Column Family: cpu cpu.usage: 45.2 cpu.temp: 72.1 cpu.cores: 8Column Family: memory memory.free: 8192 memory.total: 16384Column Family: disk disk.used: 256000 disk.free: 1000000

Compared to RDBMS

RDBMS: columns fixed schema, few columns, many rows. Column-family: flexible columns, many columns, few rows. Model: opposite orientation.

Wide Rows

Thousands/millions of columns per row: supported. RDBMS: impractical (denormalization needed). Column-family: natural (column-oriented storage).

Row-Oriented vs Column-Oriented

Row-Oriented Storage

Table: employeesRow 1: | emp_id: 1 | name: Alice | salary: 50000 | dept: 10 |Row 2: | emp_id: 2 | name: Bob | salary: 60000 | dept: 20 |Storage: Row 1 stored together, Row 2 separatelyQuery all columns for row 1: fast (sequential read)Query salary column only: slow (read entire rows)

Column-Oriented Storage

Table: employees (column-family)Column emp_id: [1, 2, 3, ...]Column name: [Alice, Bob, Carol, ...]Column salary: [50000, 60000, 55000, ...]Column dept: [10, 20, 10, ...]Storage: All salaries together, all names togetherQuery all columns for row 1: slower (multiple reads)Query salary column: very fast (single read)

Trade-offs

AspectRow-OrientedColumn-Oriented
Full row readFast (sequential)Slower (multiple reads)
Column subsetSlower (read unnecessary)Fast (sequential)
CompressionPoor (diverse types)Excellent (homogeneous)
Write efficiencyGood (single write)Poor (multiple writes)

Data Structure and Organization

Hierarchy

Table: top-level container. Row key: unique identifier (sorted). Column family: logical grouping (stored separately). Column: attribute within family. Timestamp: versioning (every write timestamped).

Sparse Data Support

Columns optional: rows can have different columns. No NULL storage: missing columns don't consume space. Billions of columns: theoretically possible (practically thousands).

Row Key Design

Key format: {entity}timestampExample: server_1_2024-03-30_12:00:00Good design: enables range queries, even distributionBad design: hot spots (sequential keys)Salting: add prefix to distributeExample: {hash(entity)}entitytimestampEffect: distributes keys across nodes

Versioning

Every write: timestamped. Keep multiple versions (configurable). Read: specify timestamp or latest. Time-travel: queries at past timestamps possible.

HBase Architecture

Components

Master: manages region assignment, cluster health. Region servers: store actual data. HDFS: underlying file system. Zookeeper: coordination, failover.

Regions

Table divided: by row key ranges. Each region: contiguous rows. Region servers: hold multiple regions. Distribution: automatic balancing.

Column Families

Defined: at table creation. Fixed: cannot add dynamically (requires redesign). Store type: configured per family (compression, bloom filters). Best practice: 2-3 families per table.

Write Process

1. Write request: region server receives2. Memstore: buffered in memory3. WAL: written to write-ahead log (HDFS)4. Memstore flush: when full, written to HFile5. Compaction: merges HFiles, optimizes

Read Process

1. Read request: specify row key2. Bloom filter: quick negative check3. Memstore check: recent writes4. HFile scan: persistent data (multiple files)5. Merge: combine results, return latest

Cassandra Design

Distributed Architecture

No master: peer-to-peer. All nodes equal. Replication: across clusters. Consistency: tunable (per request). Availability: always (AP in CAP).

Column Families (Super Columns)

Keyspace: database-levelColumn Family: table-likeSuper Column: column grouping (optional)Column: attributeExample:Keyspace: social_network Column Family: users Row: user_1 Column: name = "Alice" Column: email = "alice@example.com"

Distributed Writes

Write: sent to one node (coordinator). Replicated: to replication factor nodes (eventually). Tunable: W (write replicas required before ack). W+R > N: strong consistency.

Range Queries

Ordered: columns are sorted by name. Range scan: SELECT columns WHERE name > 'x' AND name < 'y'. Efficient: leverages sorting.

Consistency Tuning

Replication factor (RF): copies of data = 3Write quorum (W): acknowledged from 2 replicasRead quorum (R): consistency from 2 replicasScenarios:W=1, R=1: fast, eventually consistentW=RF, R=RF: slow, strongly consistentW=2, R=2 (RF=3): balanced

Compression and Storage

Columnar Compression Benefits

Same type: values similar, compress well. Delta encoding: store differences. Dictionary: repeated values. RLE: run-length encoding. Typical: 10x compression.

Example Compression

Row-oriented (poor compression):Alice|50000|Denver|2024-01-15Bob|60000|NYC|2024-01-15Carol|55000|SF|2024-01-15Column-oriented (excellent compression):Names: Alice|Bob|Carol (string compression)Salaries: 50000|60000|55000 (delta encoding)Cities: Denver|NYC|SF (dictionary)Dates: 2024-01-15|2024-01-15|2024-01-15 (RLE)

Storage Tiers

Memory: hot data, fast access. SSD: warm data, good balance. HDD: cold data, slow but cheap. Tiering: automatically move based on age.

Block-Level Compression

Blocks: 64KB-1MB. Compressed: in bulk. Decompression: on read. Tradeoff: CPU (compression) vs I/O (size).

Time-Series Optimization

Time-Series Model

Row Key: {metric}devicedateExample: temperature_sensor_1_2024-03-30Columns: {timestamp}fieldExample: 12_00_00_celsius, 12_00_01_celsius, ...Design: wide rows, many timestamp columnsNatural fit: column-family excels at this

Compression Benefits

Timestamps ordered: RLE effective. Values similar: delta encoding. Storage: efficient. Typical: 100x compression for time-series.

Query Patterns

Range: time window queries. Rollup: aggregate over time. Downsampling: lower resolution for history. All efficient: column-oriented design.

Example: Temperature Monitoring

Table: metricsRow: temperature_sensor_1_2024-03-30Columns: 12_00_00: 20.5 12_00_01: 20.6 12_00_02: 20.7 ... (thousands per day)Query: SELECT 12_00_00 to 12_30_00 (30 min range)Result: 1800 data points efficiently

Querying Column-Family Stores

Key-Based Access

GET user_1 -- retrieve entire rowGET user_1.profile.name -- get specific columnSCAN {user_1 to user_100} -- range scan

Limitations vs RDBMS

No JOINs: must denormalize. No aggregations: handle in client. No GROUP BY: mapreduce-style processing. Complex queries: difficult/slow.

Query Languages

HBase: shell, Java API, Hive (SQL-like). Cassandra: CQL (Cassandra Query Language, SQL-like). InfluxDB: InfluxQL or Flux. Abstraction: SQL-ish interfaces available.

Secondary Indexes

HBase: manual denormalization (maintain index rows). Cassandra: built-in secondary indexes. Kudu: native secondary indexes. Trade-off: index maintenance cost.

Design Patterns

Wide Row Pattern

One row per entity: many columns. Example: user profile (thousands of properties). Efficient: single row fetch. Denormalized: all data together.

Time Series Pattern

Row: entity + date. Columns: time + metric. Natural fit: analytics, monitoring. Compression: excellent. Scale: easily handles billions of data points.

Inverted Index Pattern

Forward: user_id -> propertiesInverted: property -> user_idsExample:Forward row: user_100 -> [name, email, location]Inverted row: name="Alice" -> [user_100, user_200]Enables: queries by property

Dimension Table Pattern

Slowly changing: reference data (cities, products). Cached: client-side. Updated: rarely. Denormalized: into fact rows.

Performance Characteristics

Read Patterns

Single row: very fast (range scan). Column subset: fast (columnar). Full scan: moderate (all data must be read). Filtering: efficient (server-side).

Write Patterns

Sequential writes: optimized (append-only). Random writes: acceptable (buffered). Bulk writes: excellent. Updates: expensive (read-modify-write).

Scalability

Horizontal: adds nodes, distributes data. Petabyte-scale: proven. Millions QPS: achievable. Cost: operational complexity increases.

Comparison Table

OperationPerformanceNotes
Single row readExcellentDirect access by key
Column range scanVery GoodLeverages column ordering
Sequential writeExcellentAppend-only log
Random updateGoodRead-modify-write
Complex queryPoorNo joins/aggregates

Use Cases and Applications

Time-Series Databases

IoT sensors: millions of data points. Monitoring: server metrics, application performance. Stock prices: high-frequency trading. Natural fit: column families excel.

Analytics and Data Warehousing

Column-oriented: optimal for analytics (sum salary by dept). Compression: stores massive datasets efficiently. Scale: petabyte datasets. Examples: Druid, ClickHouse.

Wide-Column Applications

User profiles: many attributes per user. Document stores: nested data in columns. Event logs: tracking attributes. Flexible: easy to add columns.

Real-Time Analytics

Live dashboards: sub-second queries. Stream aggregation: time-windowed. Cardinality: billions of unique values. Compression: enables in-memory datasets.

Not Suitable For

OLTP: transactional workloads (slow updates). Complex queries: many joins (not supported). Frequent schema changes: fixed families. Small datasets: overhead exceeds benefit.

References

  • George, L. "HBase: The Definitive Guide: Random Access to Large-Scale Data." O'Reilly Media, 2011.
  • Hewitt, E. "Cassandra: The Definitive Guide: Distributed Data at Web Scale." O'Reilly Media, 2nd edition, 2016.
  • Abadi, D., et al. "Column Stores vs Row Stores: How Different Are They Really?" Proceedings of SIGMOD, 2012.
  • Kleppmann, M. "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems." O'Reilly Media, 2017.
  • DeCandia, G., et al. "Dynamo: Amazon's Highly Available Key-Value Store." Proceedings of SOSP, 2007.