Introduction
Key-value store: simplest NoSQL database. Data model: key -> value mapping. Extreme simplicity: only get, set, delete operations. Powers social media, caching, real-time analytics.
Advantages: ultra-fast access (O(1) hash lookup), horizontal scaling, high throughput, simple operations. Disadvantages: no complex queries, limited secondary access, eventual consistency common.
Foundation: hash tables. Single-node: classic hash table. Distributed: partition across servers using consistent hashing. Examples: Redis, Memcached, DynamoDB, Cassandra.
"Key-value stores trade expressive power for performance. Simplicity enables massive scalability. Foundation for modern web services, caching, real-time systems." -- NoSQL systems
Key-Value Data Model
Simplicity
Data: key-value pairs. Key: unique identifier (string, binary). Value: any data (string, number, JSON, blob). Schema-less: values can change type.
Operations
GET(key): retrieve value. SET(key, value): store value. DELETE(key): remove. Optional: increment, append, TTL (expiration). Minimal API.
Example
Keys: user:1001, user:1002, product:A1, product:B2
Values: {name: "Alice", age: 30}, {name: "Bob", age: 25}
GET(user:1001) -> {name: "Alice", age: 30}
SET(user:1001, {name: "Alice", age: 31})
DELETE(user:1001)
Value Types
Strings: simple values. Numbers: counters. Collections: lists, sets, hashes (Redis). Complex: JSON (often serialized as string).
Expiration (TTL)
Key expires after time T. Automatic cleanup. Useful: caching (auto-invalidate), sessions (auto-logout), rate limiting. Implementation: background cleanup.
Hash Tables Foundation
Hash Function
Map key to index: h(key) -> index [0, n-1]. Fast: O(1) average. Good function: uniform distribution, deterministic, fast to compute.
Array Storage
Store pairs in array. Array[h(key)] = value. Direct access: given key, compute hash, retrieve value immediately. Ultra-fast.
Load Factor
Ratio: entries / array size. High load: many collisions, slow. Low load: wasted space. Typical: keep < 0.75. Resize (rehash) if exceeding.
Resizing
When load factor high: allocate larger array, rehash all entries. O(n) cost, amortized O(1). Happens rarely, not noticed.
Complexity
Average: O(1) get/set/delete. Worst case: O(n) if hash poor or many collisions. Practice: excellent performance, rarely worst-case.
Collision Resolution Strategies
Separate Chaining
Each bucket: linked list of entries. Collision: append to list. Lookup: search list. Simple, but chains increase lookup time.
Open Addressing
Find another empty slot if collision. Linear probing: check next slot. Quadratic: check offset^2 slots. Double hashing: use second hash. Fewer memory allocations than chaining.
Linear Probing
Collision at slot i: try i+1, i+2, ... until empty. Simple. Risk: clustering (long runs of full slots). Slow searches.
Quadratic Probing
Try i+1^2, i+2^2, i+3^2, ... Reduces clustering. Better than linear. May not fill all slots (cycle).
Performance Impact
Good hash function: few collisions, near O(1). Poor function: many collisions, degraded. Choice: trade complexity vs. performance.
In-Memory Key-Value Stores
Examples
Redis: in-memory, persistent option, pub/sub. Memcached: pure in-memory cache. Fast: data in RAM, no disk I/O.
Speed Advantage
RAM 100x faster than disk. In-memory: microsecond latencies. Network latency dominates (milliseconds). Ultra-responsive.
Persistence Options
Durability: lose data on crash. Options: RDB (snapshots), AOF (append-only log), hybrid. Trade-off: speed vs. safety.
Memory Constraints
Data limited by RAM size. Scaling: cache subset of data. Eviction policies: LRU (least recently used), LFU (least frequently), TTL. Manage memory.
Use Cases
Caching: speed up databases. Sessions: user state. Counters: page views, metrics. Real-time: leaderboards. Pub/sub: messaging.
Persistent Key-Value Stores
Disk-Based
Data written to disk. Survives crashes. Slower than in-memory but durable. Examples: LevelDB, RocksDB, LSM trees.
Log-Structured Merge (LSM) Trees
Fast writes: write to memory first (memtable), later flush to disk (SSTables). Compaction: merge SSTables. Efficient for write-heavy workloads.
B-Trees
Traditional: balanced tree, ordered keys. Good for range queries. Disk-optimized: group keys in blocks. Examples: SQLite, LevelDB.
Durability Guarantees
Fsync: force write to disk. Slow but safe. No fsync: fast but risky. Batching: group writes, fsync periodically. Balance: performance vs. durability.
Performance Tuning
Compression: reduce disk space, trade CPU. Caching: RAM for frequently accessed data. Indexing: fast lookup. Optimization: crucial for performance.
Distributed Key-Value Stores
Scaling
Single server limited by RAM/CPU/disk. Distribute: partition keys across servers. Each server subset. Scale horizontally: add servers.
Consistent Hashing
Map key to server: h(key) mod N. Problem: adding server shifts all keys (rehashing). Consistent hashing: ring topology, add server shifts only nearby keys. Minimal disruption.
Replication
Copy data to multiple servers. Availability: if one fails, others serve. Consistency: eventual (writes may not immediately visible on all).
Quorum Reads/Writes
Write to W replicas, read from R replicas. W+R > N (replication factor): strong consistency. Otherwise: eventual. Tune: consistency vs. availability.
Examples
DynamoDB: AWS managed, distributed. Cassandra: distributed, eventual consistent. Riak: distributed, tunable consistency. All scale horizontally.
Hashing and Partitioning Strategies
Hash Functions
Properties: deterministic (same key -> same hash), uniform (all hashes equally likely), fast. Examples: MurmurHash, CityHash, SHA-1.
Range Partitioning
Partition by key range (a-m, n-z). Simple, but unbalanced (some ranges more data). Rebalancing complex.
Hash Partitioning
h(key) determines server. Balanced: uniform distribution. Rebalancing: consistent hashing reduces movement. Standard: hash partitioning.
Consistent Hashing
Ring: servers placed on circle. Key: find clockwise nearest server. Add server: affects only keys between previous and new. Minimal movement.
Virtual Nodes
Each server: multiple points on ring. Balances load: servers with more capacity get more virtual nodes. Flexibility: accommodate heterogeneous clusters.
Consistency and Replication
Replication Models
Master-slave: writes to master, replicate to slaves (read-only). Multi-master: any node accepts writes. Leaderless: quorum-based consistency.
Consistency Levels
Strong: all reads see latest write. Eventual: eventually consistent. Causal: preserve causality. Tune: consistency-availability trade-off.
Conflict Resolution
Concurrent writes diverge. Resolution: timestamp (last-write-wins), application-defined, vector clocks. Merge or pick.
Write Concerns
Unacknowledged: fast, risky. Acknowledged: one replica confirms. Majority: majority replicas confirm. Trade-off: speed vs. durability.
Consistency Tuning
Per-operation: choose consistency level. Strong: slower. Eventual: faster. Flexible: application requirements determine.
Performance Characteristics
Access Time
In-memory: microseconds. Disk-based: milliseconds. Network: milliseconds (remote). Overall: network dominates in distributed.
Throughput
Single-threaded: thousands ops/sec. Multi-threaded: millions. Distributed: scale linearly with servers. Network bandwidth: bottleneck at scale.
Latency vs. Throughput
Latency: per-operation time. Throughput: operations per second. Different goals. Batch operations: improve throughput, increase latency. Trade-off.
Benchmarking
Measure: operations/sec, latency (p50, p99, p99.9). Under load: latency increases. Monitor: identify bottlenecks. Tune: optimize critical paths.
Scaling Limits
Network bandwidth: eventually saturated. Storage per server: RAM/disk full. Consistency: harder with more servers. Plan: understand scaling limits.
Popular Implementations
Redis
In-memory, rich data structures (lists, sets, hashes), pub/sub, persistence options (RDB, AOF). Fast, versatile. Standard caching/session store.
Memcached
Pure cache: in-memory, no persistence. Simple, fast, distributed. LRU eviction. Web applications: speed up databases.
DynamoDB
AWS managed, distributed, eventual consistent by default. Pay-per-request. Global tables: multi-region. Enterprise standard.
Cassandra
Distributed, highly available, tunable consistency. LSM trees, eventual consistent. Large scale: proven at scale. Operational complexity.
RocksDB
Embedded, high performance, LSM trees. Used in DynamoDB, CockroachDB. Library: integrate into applications. Not standalone server.
Real-World Applications
Caching
Speed up databases: cache frequent queries. Reduces load, improves latency. Application-managed: decide what to cache, TTL. Critical for performance.
Sessions
User login state: store in key-value. Fast access, shared across servers. TTL auto-logout. Distributed: multi-region sessions.
Real-Time Analytics
Counters, metrics: update frequently. Key-value fast increments. Leaderboards: sorted sets. Real-time dashboards depend on key-value speed.
Message Queues
Pub/sub, queues: Redis, Kafka. Decouple systems, handle spikes. Scalable, low-latency. Event-driven architectures.
Data Locality
Hot data in cache, cold on disk. Tier-based: fast/expensive (in-memory) for hot, slow/cheap (disk) for cold. Cost optimization.
References
- DeCandia, G., et al. "Dynamo: Amazon's Highly Available Key-Value Store." Proceedings of SOSP, 2007.
- Lakshman, A., and Malik, P. "Cassandra: A Decentralized Structured Storage System." Proceedings of ACM SIGOPS, 2010.
- Dong, S., et al. "RocksDB: a High-Performance Embedded Database." 2013.
- Kleppmann, M. "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems." O'Reilly Media, 2017.
- Karger, D., et al. "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web." Proceedings of ACM STOC, 1997.