Introduction
Concurrency control: manage multiple transactions simultaneously. Challenges: race conditions, inconsistency, lost updates. Goal: maintain isolation (ACID I). Methods: locking, MVCC, timestamps, optimistic. Trade-off: correctness vs performance.
"Concurrency control is the balancing act of modern databases: allow many transactions to run together without sacrificing consistency. The wrong choice can mean deadlocks on one extreme or false conflicts on the other." -- Transaction management
Concurrency Problems
Lost Update
T1: READ balance = 100T2: READ balance = 100T1: balance = 100 + 50, WRITE balance = 150T2: balance = 100 - 30, WRITE balance = 70-- T1's update lost (should be 120)Dirty Read
T1: READ data_a = 10T1: WRITE data_a = 20T2: READ data_a = 20 (uncommitted)T1: ROLLBACK (data_a back to 10)-- T2 read incorrect value (dirty read)Non-Repeatable Read
T1: READ salary = 50000T2: UPDATE salary = 55000, COMMITT1: READ salary = 55000 (different from before)-- Same query, different results (non-repeatable)Phantom Read
T1: SELECT * FROM employees WHERE dept_id = 10 (5 rows)T2: INSERT employee (dept_id = 10), COMMITT1: SELECT * FROM employees WHERE dept_id = 10 (6 rows)-- New row appeared (phantom read)Pessimistic Concurrency
Locking Approach
Assumption: conflicts likely. Strategy: acquire locks before accessing. Wait: for locks (potential deadlock). Safe: prevents race conditions. Cost: reduced parallelism.
Lock Types
| Lock Type | Purpose | Conflicts With |
|---|---|---|
| Shared (Read) | Multiple readers | Write lock |
| Exclusive (Write) | Single writer | Read/Write locks |
| Intent | Hierarchical locking | Other intent locks |
Two-Phase Locking (2PL)
Growing phase: acquire locks. Shrinking phase: release locks. Rule: cannot acquire after releasing. Guarantee: serializability. Example: SELECT FOR UPDATE.
Drawbacks
Deadlock risk: circular wait. Performance: waiting reduces throughput. Contention: high-traffic resources block many. Overhead: lock management cost.
Multi-Version Concurrency Control (MVCC)
Concept
Multiple versions: each transaction sees consistent snapshot. No blocking: readers don't block writers. Isolation: transactions see committed data as of transaction start. Cost: version overhead.
How MVCC Works
Initial state: balance = 100 (version 1)T1 START (snapshot: version 1)T2 START (snapshot: version 1) T1: READ balance = 100 T2: UPDATE balance = 150 (creates version 2) T2: COMMIT T1: READ balance (still sees version 1 = 100) -- T1 sees snapshot at start time, not latestT1: COMMITVersion Management
Garbage collection: remove old versions (no active transactions). Transaction ID: timestamped (ordering). Visibility: versions visible within transaction lifespan. Memory: versions consume space.
Advantages
Non-blocking reads: no locks needed. Better concurrency: parallel execution. No deadlock: different mechanism. Good for OLTP: many readers/writers.
Disadvantages
Version overhead: extra storage. Garbage collection: cleanup cost. Write conflicts: still serialized (write-write conflicts). Bloat: versions accumulate.
Timestamp Ordering
Timestamp Assignment
Each transaction: assigned unique timestamp. Older timestamp: logically earlier. Ordering: timestamps determine serialization order. Enforcement: reject operations violating order.
Basic Timestamp Protocol
For transaction T with timestamp TS(T):On READ: If TS(T) < W-timestamp (write timestamp) Reject (read after write) Else update R-timestampOn WRITE: If TS(T) < R-timestamp or W-timestamp Reject (write violates ordering) Else perform write, update W-timestampThomas Write Rule
Optimization: allow certain writes. If transaction writes old version: ignore (later transaction already wrote). Prevents cascading aborts. Maintains serializability (but loses information).
Advantages
No deadlock: predetermined order. Simple: deterministic. Serialization: automatic.
Disadvantages
Aborts: frequent (conflicts with ordering). Cascading: aborted transaction aborts others. Performance: many restarts. Timestamp overhead: every operation checked.
Optimistic Concurrency
Assumption
Conflicts rare: most transactions succeed. No locking: execute freely. Validation: check at commit. Rollback: if conflict detected.
Three Phases
1. READ phase: transaction reads from DB, writes to local buffer No locks acquired, no conflicts possible2. VALIDATION phase: check if writes conflict with others Compare with concurrent transactions Check read/write sets3. WRITE phase: if valid, write to DB If invalid, rollback and retryValidation Algorithm
For transaction T with timestamp TS(T):Check all earlier completed transactions Ti: If T.ReadSet intersects Ti.WriteSet ABORT (T read data Ti modified) If T.WriteSet intersects Ti.ReadSet ABORT (T modified data Ti read) If T.WriteSet intersects Ti.WriteSet ABORT (write conflict)If all checks pass: COMMITAdvantages
No deadlock: no locks. Parallelism: free reading. Simple: no lock management. Good when conflicts rare.
Disadvantages
Aborts: under contention. Wasted work: computations may be discarded. Tuning difficulty: sensitive to parameters. Overhead: validation check expensive.
Concurrency Protocols
Serialization Graphs
Node: each transaction. Edge: conflict (T1 before T2). Cycle: non-serializable. Acyclic: serializable. Verification: topological sort possible.
Serializability Testing
Precedence graph:T1 -> T2: T1's write before T2's read (or write)T2 -> T3: T2's write before T3's readCycle T1 -> T2 -> T1: non-serializableNo cycle: serializableProtocol Guarantees
2PL: guarantees serializability (but deadlock possible). Timestamp: guarantees serializability (aborts possible). Optimistic: guarantees serializability (validation phase). MVCC: guarantees snapshot isolation (slightly weaker).
Deadlock Detection and Resolution
Deadlock Scenario
T1: LOCK AT2: LOCK BT1: LOCK B (wait for T2)T2: LOCK A (wait for T1)-- Circular wait: deadlockDetection Methods
Timeout: wait limit, then rollback. Deadlock graph: detect cycles, choose victim. Lock manager: tracks all locks, builds dependency graph.
Resolution
Victim selection: choose transaction to rollback. Criteria: cost (computation, I/O), progress (how far), priority (important transactions). Rollback: release locks, restart.
Prevention
-- Resource ordering: prevent circular waitAll transactions acquire locks in same order: Lock A, then B, then C-- No cycle possibleAvoidance
Wait-die: older transaction waits, younger dies. Wound-wait: younger transaction waits, older wounds (forces restart). Conservative: acquire all locks upfront.
Recovery from Conflicts
Abort and Retry
Transaction aborted: loses changes. Restart: transaction reruns. Retry strategy: exponential backoff (avoid spinning). Cascade: be careful (aborting affects others).
Cascading Aborts
T1: WRITE x, COMMITT2: READ x (from T1)T1: Later fails, ROLLBACKT2: Must abort (read dirty data)T3: Read from T2, must abort-- Cascade through dependent transactionsAvoiding Cascades
No dirty reads: transactions commit before reads. Strict 2PL: hold locks until commit. Isolation levels: control what's visible during transaction.
Idempotency
Retryable: transaction produces same result if rerun. Not idempotent: side effects (external calls, duplicate key insert). Design: make transactions idempotent when possible.
Performance Trade-offs
Locking Performance
Throughput: decreases with contention (waiting). Latency: affected by lock waits. Deadlock overhead: detection/resolution cost. Scalability: locks reduce parallel throughput.
MVCC Performance
Read throughput: high (no locks). Write throughput: varies (conflicts serialized). Storage: version overhead. Cleanup: garbage collection cost.
Timestamp Performance
Abort rate: increases with conflicts. Restart cost: recompute transaction. Timestamp overhead: checking every operation. Good for: read-heavy, few conflicts.
Comparison
| Protocol | Read Performance | Write Performance | Deadlock Risk |
|---|---|---|---|
| 2PL | Medium (waits) | Medium | High |
| MVCC | High (no locks) | Medium | None |
| Timestamp | Medium (aborts) | Low (no locks) | None |
| Optimistic | High (free reads) | Low | None |
DBMS Implementations
PostgreSQL
Default: MVCC (multiversion concurrency control). Snapshots: transactions see consistent snapshot. Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Write-write conflicts: still serialized.
MySQL/InnoDB
Default: MVCC with locking. Undo logs: previous versions stored. Gap locks: phantom prevention. Next-key locking: locks range (index + gap). Deadlock detection: automatic.
SQL Server
Options: pessimistic locking or MVCC. Row versioning: READ COMMITTED SNAPSHOT isolation. Snapshot isolation: separate from serializability. Write conflicts: always detected.
Oracle
Primarily: MVCC (undo segments). Read consistency: statement-level or transaction-level. Lock levels: very granular. Write-write: first writer wins.
Choosing a Strategy
Decision Factors
Workload: OLTP (many short transactions) vs OLAP (few long queries). Contention: read-heavy vs write-heavy. Consistency: required isolation level. Hardware: CPU, RAM, I/O availability.
Guideline
Low contention: optimistic concurrency (fast). High contention: MVCC (no blocking). Write-heavy: timestamp ordering. General: MVCC is becoming standard (good balance).
Tuning Knobs
Lock timeout: how long to wait. Deadlock detection interval: check frequency. Version retention: how many to keep. Batch size: transaction granularity.
References
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
- Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2003.
- Bernstein, P. A., Hadzilacos, V., and Goodman, N. "Concurrency Control and Recovery in Database Systems." Addison-Wesley, 1987.