!main_tags!Concurrency Control - Database Systems | What's Your IQ !main_header!

Introduction

Concurrency control: manage multiple transactions simultaneously. Challenges: race conditions, inconsistency, lost updates. Goal: maintain isolation (ACID I). Methods: locking, MVCC, timestamps, optimistic. Trade-off: correctness vs performance.

"Concurrency control is the balancing act of modern databases: allow many transactions to run together without sacrificing consistency. The wrong choice can mean deadlocks on one extreme or false conflicts on the other." -- Transaction management

Concurrency Problems

Lost Update

T1: READ balance = 100T2: READ balance = 100T1: balance = 100 + 50, WRITE balance = 150T2: balance = 100 - 30, WRITE balance = 70-- T1's update lost (should be 120)

Dirty Read

T1: READ data_a = 10T1: WRITE data_a = 20T2: READ data_a = 20 (uncommitted)T1: ROLLBACK (data_a back to 10)-- T2 read incorrect value (dirty read)

Non-Repeatable Read

T1: READ salary = 50000T2: UPDATE salary = 55000, COMMITT1: READ salary = 55000 (different from before)-- Same query, different results (non-repeatable)

Phantom Read

T1: SELECT * FROM employees WHERE dept_id = 10 (5 rows)T2: INSERT employee (dept_id = 10), COMMITT1: SELECT * FROM employees WHERE dept_id = 10 (6 rows)-- New row appeared (phantom read)

Pessimistic Concurrency

Locking Approach

Assumption: conflicts likely. Strategy: acquire locks before accessing. Wait: for locks (potential deadlock). Safe: prevents race conditions. Cost: reduced parallelism.

Lock Types

Lock Type Purpose Conflicts With
Shared (Read) Multiple readers Write lock
Exclusive (Write) Single writer Read/Write locks
Intent Hierarchical locking Other intent locks

Two-Phase Locking (2PL)

Growing phase: acquire locks. Shrinking phase: release locks. Rule: cannot acquire after releasing. Guarantee: serializability. Example: SELECT FOR UPDATE.

Drawbacks

Deadlock risk: circular wait. Performance: waiting reduces throughput. Contention: high-traffic resources block many. Overhead: lock management cost.

Multi-Version Concurrency Control (MVCC)

Concept

Multiple versions: each transaction sees consistent snapshot. No blocking: readers don't block writers. Isolation: transactions see committed data as of transaction start. Cost: version overhead.

How MVCC Works

Initial state: balance = 100 (version 1)T1 START (snapshot: version 1)T2 START (snapshot: version 1) T1: READ balance = 100 T2: UPDATE balance = 150 (creates version 2) T2: COMMIT T1: READ balance (still sees version 1 = 100) -- T1 sees snapshot at start time, not latestT1: COMMIT

Version Management

Garbage collection: remove old versions (no active transactions). Transaction ID: timestamped (ordering). Visibility: versions visible within transaction lifespan. Memory: versions consume space.

Advantages

Non-blocking reads: no locks needed. Better concurrency: parallel execution. No deadlock: different mechanism. Good for OLTP: many readers/writers.

Disadvantages

Version overhead: extra storage. Garbage collection: cleanup cost. Write conflicts: still serialized (write-write conflicts). Bloat: versions accumulate.

Timestamp Ordering

Timestamp Assignment

Each transaction: assigned unique timestamp. Older timestamp: logically earlier. Ordering: timestamps determine serialization order. Enforcement: reject operations violating order.

Basic Timestamp Protocol

For transaction T with timestamp TS(T):On READ: If TS(T) < W-timestamp (write timestamp) Reject (read after write) Else update R-timestampOn WRITE: If TS(T) < R-timestamp or W-timestamp Reject (write violates ordering) Else perform write, update W-timestamp

Thomas Write Rule

Optimization: allow certain writes. If transaction writes old version: ignore (later transaction already wrote). Prevents cascading aborts. Maintains serializability (but loses information).

Advantages

No deadlock: predetermined order. Simple: deterministic. Serialization: automatic.

Disadvantages

Aborts: frequent (conflicts with ordering). Cascading: aborted transaction aborts others. Performance: many restarts. Timestamp overhead: every operation checked.

Optimistic Concurrency

Assumption

Conflicts rare: most transactions succeed. No locking: execute freely. Validation: check at commit. Rollback: if conflict detected.

Three Phases

1. READ phase: transaction reads from DB, writes to local buffer No locks acquired, no conflicts possible2. VALIDATION phase: check if writes conflict with others Compare with concurrent transactions Check read/write sets3. WRITE phase: if valid, write to DB If invalid, rollback and retry

Validation Algorithm

For transaction T with timestamp TS(T):Check all earlier completed transactions Ti: If T.ReadSet intersects Ti.WriteSet ABORT (T read data Ti modified) If T.WriteSet intersects Ti.ReadSet ABORT (T modified data Ti read) If T.WriteSet intersects Ti.WriteSet ABORT (write conflict)If all checks pass: COMMIT

Advantages

No deadlock: no locks. Parallelism: free reading. Simple: no lock management. Good when conflicts rare.

Disadvantages

Aborts: under contention. Wasted work: computations may be discarded. Tuning difficulty: sensitive to parameters. Overhead: validation check expensive.

Concurrency Protocols

Serialization Graphs

Node: each transaction. Edge: conflict (T1 before T2). Cycle: non-serializable. Acyclic: serializable. Verification: topological sort possible.

Serializability Testing

Precedence graph:T1 -> T2: T1's write before T2's read (or write)T2 -> T3: T2's write before T3's readCycle T1 -> T2 -> T1: non-serializableNo cycle: serializable

Protocol Guarantees

2PL: guarantees serializability (but deadlock possible). Timestamp: guarantees serializability (aborts possible). Optimistic: guarantees serializability (validation phase). MVCC: guarantees snapshot isolation (slightly weaker).

Deadlock Detection and Resolution

Deadlock Scenario

T1: LOCK AT2: LOCK BT1: LOCK B (wait for T2)T2: LOCK A (wait for T1)-- Circular wait: deadlock

Detection Methods

Timeout: wait limit, then rollback. Deadlock graph: detect cycles, choose victim. Lock manager: tracks all locks, builds dependency graph.

Resolution

Victim selection: choose transaction to rollback. Criteria: cost (computation, I/O), progress (how far), priority (important transactions). Rollback: release locks, restart.

Prevention

-- Resource ordering: prevent circular waitAll transactions acquire locks in same order: Lock A, then B, then C-- No cycle possible

Avoidance

Wait-die: older transaction waits, younger dies. Wound-wait: younger transaction waits, older wounds (forces restart). Conservative: acquire all locks upfront.

Recovery from Conflicts

Abort and Retry

Transaction aborted: loses changes. Restart: transaction reruns. Retry strategy: exponential backoff (avoid spinning). Cascade: be careful (aborting affects others).

Cascading Aborts

T1: WRITE x, COMMITT2: READ x (from T1)T1: Later fails, ROLLBACKT2: Must abort (read dirty data)T3: Read from T2, must abort-- Cascade through dependent transactions

Avoiding Cascades

No dirty reads: transactions commit before reads. Strict 2PL: hold locks until commit. Isolation levels: control what's visible during transaction.

Idempotency

Retryable: transaction produces same result if rerun. Not idempotent: side effects (external calls, duplicate key insert). Design: make transactions idempotent when possible.

Performance Trade-offs

Locking Performance

Throughput: decreases with contention (waiting). Latency: affected by lock waits. Deadlock overhead: detection/resolution cost. Scalability: locks reduce parallel throughput.

MVCC Performance

Read throughput: high (no locks). Write throughput: varies (conflicts serialized). Storage: version overhead. Cleanup: garbage collection cost.

Timestamp Performance

Abort rate: increases with conflicts. Restart cost: recompute transaction. Timestamp overhead: checking every operation. Good for: read-heavy, few conflicts.

Comparison

Protocol Read Performance Write Performance Deadlock Risk
2PL Medium (waits) Medium High
MVCC High (no locks) Medium None
Timestamp Medium (aborts) Low (no locks) None
Optimistic High (free reads) Low None

DBMS Implementations

PostgreSQL

Default: MVCC (multiversion concurrency control). Snapshots: transactions see consistent snapshot. Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Write-write conflicts: still serialized.

MySQL/InnoDB

Default: MVCC with locking. Undo logs: previous versions stored. Gap locks: phantom prevention. Next-key locking: locks range (index + gap). Deadlock detection: automatic.

SQL Server

Options: pessimistic locking or MVCC. Row versioning: READ COMMITTED SNAPSHOT isolation. Snapshot isolation: separate from serializability. Write conflicts: always detected.

Oracle

Primarily: MVCC (undo segments). Read consistency: statement-level or transaction-level. Lock levels: very granular. Write-write: first writer wins.

Choosing a Strategy

Decision Factors

Workload: OLTP (many short transactions) vs OLAP (few long queries). Contention: read-heavy vs write-heavy. Consistency: required isolation level. Hardware: CPU, RAM, I/O availability.

Guideline

Low contention: optimistic concurrency (fast). High contention: MVCC (no blocking). Write-heavy: timestamp ordering. General: MVCC is becoming standard (good balance).

Tuning Knobs

Lock timeout: how long to wait. Deadlock detection interval: check frequency. Version retention: how many to keep. Batch size: transaction granularity.

References

  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
  • Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
  • Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2003.
  • Bernstein, P. A., Hadzilacos, V., and Goodman, N. "Concurrency Control and Recovery in Database Systems." Addison-Wesley, 1987.
!main_footer!