Introduction

Concurrency control: manage multiple transactions simultaneously. Challenges: race conditions, inconsistency, lost updates. Goal: maintain isolation (ACID I). Methods: locking, MVCC, timestamps, optimistic. Trade-off: correctness vs performance.

"Concurrency control is the balancing act of modern databases: allow many transactions to run together without sacrificing consistency. The wrong choice can mean deadlocks on one extreme or false conflicts on the other." -- Transaction management

Concurrency Problems

Lost Update

T1: READ balance = 100T2: READ balance = 100T1: balance = 100 + 50, WRITE balance = 150T2: balance = 100 - 30, WRITE balance = 70-- T1's update lost (should be 120)

Dirty Read

T1: READ data_a = 10T1: WRITE data_a = 20T2: READ data_a = 20 (uncommitted)T1: ROLLBACK (data_a back to 10)-- T2 read incorrect value (dirty read)

Non-Repeatable Read

T1: READ salary = 50000T2: UPDATE salary = 55000, COMMITT1: READ salary = 55000 (different from before)-- Same query, different results (non-repeatable)

Phantom Read

T1: SELECT * FROM employees WHERE dept_id = 10 (5 rows)T2: INSERT employee (dept_id = 10), COMMITT1: SELECT * FROM employees WHERE dept_id = 10 (6 rows)-- New row appeared (phantom read)

Pessimistic Concurrency

Locking Approach

Assumption: conflicts likely. Strategy: acquire locks before accessing. Wait: for locks (potential deadlock). Safe: prevents race conditions. Cost: reduced parallelism.

Lock Types

Lock TypePurposeConflicts With
Shared (Read)Multiple readersWrite lock
Exclusive (Write)Single writerRead/Write locks
IntentHierarchical lockingOther intent locks

Two-Phase Locking (2PL)

Growing phase: acquire locks. Shrinking phase: release locks. Rule: cannot acquire after releasing. Guarantee: serializability. Example: SELECT FOR UPDATE.

Drawbacks

Deadlock risk: circular wait. Performance: waiting reduces throughput. Contention: high-traffic resources block many. Overhead: lock management cost.

Multi-Version Concurrency Control (MVCC)

Concept

Multiple versions: each transaction sees consistent snapshot. No blocking: readers don't block writers. Isolation: transactions see committed data as of transaction start. Cost: version overhead.

How MVCC Works

Initial state: balance = 100 (version 1)T1 START (snapshot: version 1)T2 START (snapshot: version 1) T1: READ balance = 100 T2: UPDATE balance = 150 (creates version 2) T2: COMMIT T1: READ balance (still sees version 1 = 100) -- T1 sees snapshot at start time, not latestT1: COMMIT

Version Management

Garbage collection: remove old versions (no active transactions). Transaction ID: timestamped (ordering). Visibility: versions visible within transaction lifespan. Memory: versions consume space.

Advantages

Non-blocking reads: no locks needed. Better concurrency: parallel execution. No deadlock: different mechanism. Good for OLTP: many readers/writers.

Disadvantages

Version overhead: extra storage. Garbage collection: cleanup cost. Write conflicts: still serialized (write-write conflicts). Bloat: versions accumulate.

Timestamp Ordering

Timestamp Assignment

Each transaction: assigned unique timestamp. Older timestamp: logically earlier. Ordering: timestamps determine serialization order. Enforcement: reject operations violating order.

Basic Timestamp Protocol

For transaction T with timestamp TS(T):On READ: If TS(T) < W-timestamp (write timestamp) Reject (read after write) Else update R-timestampOn WRITE: If TS(T) < R-timestamp or W-timestamp Reject (write violates ordering) Else perform write, update W-timestamp

Thomas Write Rule

Optimization: allow certain writes. If transaction writes old version: ignore (later transaction already wrote). Prevents cascading aborts. Maintains serializability (but loses information).

Advantages

No deadlock: predetermined order. Simple: deterministic. Serialization: automatic.

Disadvantages

Aborts: frequent (conflicts with ordering). Cascading: aborted transaction aborts others. Performance: many restarts. Timestamp overhead: every operation checked.

Optimistic Concurrency

Assumption

Conflicts rare: most transactions succeed. No locking: execute freely. Validation: check at commit. Rollback: if conflict detected.

Three Phases

1. READ phase: transaction reads from DB, writes to local buffer No locks acquired, no conflicts possible2. VALIDATION phase: check if writes conflict with others Compare with concurrent transactions Check read/write sets3. WRITE phase: if valid, write to DB If invalid, rollback and retry

Validation Algorithm

For transaction T with timestamp TS(T):Check all earlier completed transactions Ti: If T.ReadSet intersects Ti.WriteSet ABORT (T read data Ti modified) If T.WriteSet intersects Ti.ReadSet ABORT (T modified data Ti read) If T.WriteSet intersects Ti.WriteSet ABORT (write conflict)If all checks pass: COMMIT

Advantages

No deadlock: no locks. Parallelism: free reading. Simple: no lock management. Good when conflicts rare.

Disadvantages

Aborts: under contention. Wasted work: computations may be discarded. Tuning difficulty: sensitive to parameters. Overhead: validation check expensive.

Concurrency Protocols

Serialization Graphs

Node: each transaction. Edge: conflict (T1 before T2). Cycle: non-serializable. Acyclic: serializable. Verification: topological sort possible.

Serializability Testing

Precedence graph:T1 -> T2: T1's write before T2's read (or write)T2 -> T3: T2's write before T3's readCycle T1 -> T2 -> T1: non-serializableNo cycle: serializable

Protocol Guarantees

2PL: guarantees serializability (but deadlock possible). Timestamp: guarantees serializability (aborts possible). Optimistic: guarantees serializability (validation phase). MVCC: guarantees snapshot isolation (slightly weaker).

Deadlock Detection and Resolution

Deadlock Scenario

T1: LOCK AT2: LOCK BT1: LOCK B (wait for T2)T2: LOCK A (wait for T1)-- Circular wait: deadlock

Detection Methods

Timeout: wait limit, then rollback. Deadlock graph: detect cycles, choose victim. Lock manager: tracks all locks, builds dependency graph.

Resolution

Victim selection: choose transaction to rollback. Criteria: cost (computation, I/O), progress (how far), priority (important transactions). Rollback: release locks, restart.

Prevention

-- Resource ordering: prevent circular waitAll transactions acquire locks in same order: Lock A, then B, then C-- No cycle possible

Avoidance

Wait-die: older transaction waits, younger dies. Wound-wait: younger transaction waits, older wounds (forces restart). Conservative: acquire all locks upfront.

Recovery from Conflicts

Abort and Retry

Transaction aborted: loses changes. Restart: transaction reruns. Retry strategy: exponential backoff (avoid spinning). Cascade: be careful (aborting affects others).

Cascading Aborts

T1: WRITE x, COMMITT2: READ x (from T1)T1: Later fails, ROLLBACKT2: Must abort (read dirty data)T3: Read from T2, must abort-- Cascade through dependent transactions

Avoiding Cascades

No dirty reads: transactions commit before reads. Strict 2PL: hold locks until commit. Isolation levels: control what's visible during transaction.

Idempotency

Retryable: transaction produces same result if rerun. Not idempotent: side effects (external calls, duplicate key insert). Design: make transactions idempotent when possible.

Performance Trade-offs

Locking Performance

Throughput: decreases with contention (waiting). Latency: affected by lock waits. Deadlock overhead: detection/resolution cost. Scalability: locks reduce parallel throughput.

MVCC Performance

Read throughput: high (no locks). Write throughput: varies (conflicts serialized). Storage: version overhead. Cleanup: garbage collection cost.

Timestamp Performance

Abort rate: increases with conflicts. Restart cost: recompute transaction. Timestamp overhead: checking every operation. Good for: read-heavy, few conflicts.

Comparison

ProtocolRead PerformanceWrite PerformanceDeadlock Risk
2PLMedium (waits)MediumHigh
MVCCHigh (no locks)MediumNone
TimestampMedium (aborts)Low (no locks)None
OptimisticHigh (free reads)LowNone

DBMS Implementations

PostgreSQL

Default: MVCC (multiversion concurrency control). Snapshots: transactions see consistent snapshot. Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Write-write conflicts: still serialized.

MySQL/InnoDB

Default: MVCC with locking. Undo logs: previous versions stored. Gap locks: phantom prevention. Next-key locking: locks range (index + gap). Deadlock detection: automatic.

SQL Server

Options: pessimistic locking or MVCC. Row versioning: READ COMMITTED SNAPSHOT isolation. Snapshot isolation: separate from serializability. Write conflicts: always detected.

Oracle

Primarily: MVCC (undo segments). Read consistency: statement-level or transaction-level. Lock levels: very granular. Write-write: first writer wins.

Choosing a Strategy

Decision Factors

Workload: OLTP (many short transactions) vs OLAP (few long queries). Contention: read-heavy vs write-heavy. Consistency: required isolation level. Hardware: CPU, RAM, I/O availability.

Guideline

Low contention: optimistic concurrency (fast). High contention: MVCC (no blocking). Write-heavy: timestamp ordering. General: MVCC is becoming standard (good balance).

Tuning Knobs

Lock timeout: how long to wait. Deadlock detection interval: check frequency. Version retention: how many to keep. Batch size: transaction granularity.

References

  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
  • Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
  • Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2003.
  • Bernstein, P. A., Hadzilacos, V., and Goodman, N. "Concurrency Control and Recovery in Database Systems." Addison-Wesley, 1987.