Eventual Consistency

Introduction

Eventual consistency: relaxed consistency model tolerating temporary inconsistencies. All replicas eventually converge to same state (given sufficient time, no new writes). Enables high availability in distributed systems (AP in CAP).

Trade-off: sacrifice strong consistency (immediate visibility) for availability (always responsive). Replicas may temporarily diverge. Acceptable for many modern applications (social media, analytics, caching).

Practical reality: most large-scale distributed systems use eventual consistency. Enables geographic distribution, fault tolerance, high scalability. Cost: application developers handle temporary inconsistencies.

"Eventual consistency enables global availability by accepting temporary divergence. Replicas converge through asynchronous synchronization, providing high availability at cost of immediate correctness." -- Distributed systems design

Eventual Consistency Definition

Formal Definition

System is eventually consistent if: given no new writes, all replicas converge to identical state. Reads may return stale data temporarily. No guarantee when convergence occurs.

Guarantees Provided

Liveness: system responds to requests (available). Durability: writes persisted (not lost). Convergence: eventually consistent (not forever diverged). Ordering: partial (causal, not total).

Key Insight

Not immediate consistency. Temporal window: write at node A, read at node B immediately may see old value. Window closes: nodes synchronize, read sees new value.

Comparison to Strong Consistency

Strong: immediate visibility across all nodes. Expensive: requires synchronization. Eventual: delayed visibility. Cheap: asynchronous. Trade-off: correctness for cost/performance.

Consistency Models Spectrum

Strong Consistency (Linearizability)

Strongest: all operations appear to execute sequentially in real time. Immediate visibility. Expensive. Example: traditional databases with transactions.

Sequential Consistency

Weaker: all operations ordered, but not necessarily in real time. Order determined by synchronization. Still expensive.

Causal Consistency

Intermediate: causally related operations ordered, unrelated may not be. Balance: partial ordering, efficiency.

Weak: only guarantee convergence eventually. No ordering guarantees during temporary divergence. Cheapest, most scalable.

Spectrum

Strong Consistency -----> Eventual Consistency
(Expensive, slow) (Cheap, fast)
Immediate visibility Delayed visibility
Tight synchronization Loose synchronization

Weak Consistency

Definition

Weak consistency: reads may return outdated values. No guarantee immediate visibility. Replicas lag: asynchronous updates.

Staleness Metric

How stale can reads be? Bounded staleness: within time T or N updates. Unbounded: could be very stale. Trade-off: bound freshness, cost more.

Write-Your-Reads Consistency

Session consistency: client reads own writes (at least). Session-local guarantee. Other clients may see stale (outside session).

Read-Your-Writes

Single client: writes visible to subsequent reads from same client. Example: update profile, see changes immediately. Other users may lag.

Implementation

Track write time/version. Client read: check replica version before returning. If stale, wait or redirect. Ensures session consistency.

Convergence Properties

Monotonic Convergence

Replicas move closer to consistency over time (monotonically). No oscillation. Eventually identical.

Convergence Time

How long until convergence? Variable: depends on network, load, replication mechanism. From milliseconds (LAN) to seconds (WAN). Unpredictable.

Partial Visibility

During convergence: different clients see different states. Some see write A, others don't. Eventually all see A. Temporal window.

Measurement

Replication lag: delay between write and visibility across all replicas. Monitoring: track lag, alert if exceeding bounds. Tuning: improve if critical.

Guarantees

Not bounded: convergence could take long. Risk: application must tolerate. Design: assume lag exists, handle gracefully.

Ordering Guarantees

Partial Order

Causally related: A -> B (A causes B). B happens after A, same causal chain. Different chains: no guaranteed order.

Causal Ordering

Guarantee: if A causally precedes B, all replicas see A before B. Intuitive: preserves causality. Practical: enough for most apps.

Total Order

Strongest: all operations ordered globally, across all replicas. Expensive: requires consensus. Eventual consistency doesn't guarantee.

Last-Write-Wins (LWW)

Simplest conflict resolution: latest write wins (by timestamp). Fast, but loses intermediate writes. Risk: data loss. Simple but risky.

Conflict Resolution Strategies

Definition

Conflicts: concurrent writes to same data. Different replicas see different values. Must resolve: merge, pick one, or application decides.

Last-Write-Wins (LWW)

Pick latest write (timestamp). Fast, no synchronization. Risk: lose earlier writes. Acceptable: version numbers, user updates.

Vector Clocks

Track causality: which writes precede others. Detect conflicts: concurrent writes. Determine: apply both, pick one, or merge.

Merge Strategies

Application-defined: business logic determines merge. Example: cart items: merge (combine items), not replace. Domain-specific.

Three-Way Merge

Common (version control): original, version A, version B. Compute: changes from original to A and B. Merge intelligently. Complex but powerful.

Example Conflict

Write at A: balance = 100
Write at B: balance = 200
Both succeed (different nodes)

Conflict: which value correct?
LWW: pick newer timestamp
Merge: business logic (unlikely both correct)

Vector Clocks and Causal Ordering

Vector Clock Concept

Track causality: tuple of timestamps (one per node). (A:1, B:0) = one event at A. (A:1, B:1) = events at A and B. Detect: causality via comparison.

Comparison Rules

VC1 < VC2 if: all components VC1[i] <= VC2[i], and at least one strictly less. Causality: earlier VC precedes later. Concurrent: incomparable VCs.

Implementation

Increment own clock on each operation. Send clock with message. Receiver: max(own clock, received clock). Track causality efficiently.

Scalability Issue

Vector size = number of nodes. Large clusters: overhead. Mitigation: interval tree clocks, dynamo-style simple timestamps. Trade-off precision for scalability.

Ordering Guarantee

With vector clocks: enforce causal ordering. If VC1 < VC2, deliver/apply in order. Prevents: reading future events before causes.

Anti-Entropy and Repair

Anti-Entropy Purpose

Repair divergence: synchronize replicas periodically. Detect differences, resolve conflicts, converge. Ensures eventual consistency property.

Mechanisms

Read repair: on read, check replicas, fix stale. Async repair: background process synchronizes periodically. Hybrid: combine both.

Merkle Trees

Efficient comparison: hash tree of data. Compare trees: identify differences efficiently. Sync only diverged partitions. Reduces bandwidth.

Sync Strategy

Full sync: slow, transfers all data. Partial sync: smart comparison. Incremental: track changes, sync only new. Efficiency: partial/incremental.

Frequency

Aggressive: sync often, converge fast, more overhead. Lazy: sync rarely, less overhead, slower convergence. Tune: requirements-dependent.

Applications and Trade-offs

Suitable Applications

Social media: likes, comments eventual consistency acceptable. Analytics: approximate data fine. Caching: stale acceptable. Recommendations: eventual fine.

Unsuitable Applications

Banking: strong consistency needed. Medical: consistency critical. Stock trading: immediate correctness necessary. Regulatory: compliance demands consistency.

Cost-Benefit

Benefit: high availability, scalability, geographic distribution. Cost: temporary inconsistency, complexity, application handling. Choose if benefit > cost.

Hybrid Approach

Some data: strong consistency. Other: eventual. Example: user profile (consistency), activity feed (eventual). Flexibility: choose per entity.

User Experience Implications

Perceived Consistency

Users tolerate eventual consistency if invisible (converges fast). Acceptable: delayed notifications. Unacceptable: lost data.

Staleness Tolerance

How stale acceptable? Seconds (chat): tolerable. Minutes (news): maybe. Hours (historical): fine. Domain-dependent.

Visibility Expectations

Users expect: writes immediately visible (their device). Other users: delayed acceptable. Design: ensure write-your-reads consistency.

Error Handling

Show data freshness: "updated 2 minutes ago". Inform users: data may not be latest. Transparency: reduces surprise.

Design Patterns

Optimistic updates: show change immediately (local), confirm from server. Graceful degradation: read stale if needed. Refresh: let users get latest.

Practical Systems Implementation

Cassandra

Tunable consistency: read/write quorums determine strength. N=replication factor, W=write quorums, R=read quorums. W+R > N: strong. Otherwise: eventual.

DynamoDB

AWS DynamoDB: eventually consistent by default (fast). Strong consistency option: slower. Application chooses: speed vs. consistency per request.

Riak

Distributed key-value: N replicas, tunable W/R. Vector clocks: causal consistency. Merge operators: conflict resolution. Flexible.

CouchDB

Document database: replication-based. Multi-master: any node accepts writes. Eventual convergence. Conflicts: user-defined resolution.

Monitoring

Track replication lag: time between write and full visibility. Alert: if exceeding bounds. Metrics: convergence time, conflict rates. Essential.

Tuning

Adjust quorums: stronger consistency or faster. Replication factor: more replicas, slower writes, higher availability. Balance: requirements vs. trade-offs.

References

Vogels, W. "Eventually Consistent." ACM Queue, vol. 6, no. 6, 2008, pp. 14-19.
DeCandia, G., et al. "Dynamo: Amazon's Highly Available Key-Value Store." Proceedings of SOSP, 2007.
Lamport, L. "Time, Clocks, and the Ordering of Events in a Distributed System." Communications of the ACM, vol. 21, no. 7, 1978, pp. 558-565.
Mattern, F. "Virtual Time and Global States of Distributed Systems." Workshop on Parallel and Distributed Algorithms, 1988.
Kleppmann, M. "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems." O'Reilly Media, 2017.