Introduction
Two-Phase Locking (2PL): protocol ensuring serializability. Divides transaction into two phases: growing (acquire locks), shrinking (release locks). Simple protocol, powerful guarantee. Foundation: database concurrency control.
Theorem: if all transactions follow 2PL, execution serializable (transaction appear sequential). Proof: lock-based prevents cycles in dependency graph. Standard: SQL databases implement (usually transparently).
Variants: Strict 2PL (standard), Rigorous 2PL (stronger), Conservative 2PL (prevented deadlock). Trade-offs: simplicity, deadlock risk, performance.
"Two-Phase Locking guarantees serializability through simple discipline: acquire before access, release after complete. Foundation: decades of proven correctness, trusted by production systems worldwide." -- Concurrency control
Basic Concept
Definition
Two phases: Growing (Phase 1) and Shrinking (Phase 2). Growing: acquire locks, no releases. Shrinking: release locks, no acquires. Lock point: transition (maximum locks held).
Example Timeline
Transaction: BEGIN
Acquire lock on row 1 (growing phase)
Acquire lock on row 2 (growing phase)
Access rows 1, 2
Lock point (maximum locks here)
Release lock on row 1 (shrinking phase)
Release lock on row 2 (shrinking phase)
COMMIT
Enforcement
Automatic: system enforces (usually transparent). Implicit: acquires at access, releases at phase boundary. Explicit: application can request (rare).
Property
Necessary condition: serializability. Sufficient: also guaranteed (if acyclic). Proof: conflict serializable => 2PL schedule possible.
Key Insight
Locks prevent conflicts: cycle prevention. No cycles: serializable. 2PL structure: prevents cycles (mathematical guarantee).
Growing Phase
Activities
Acquire locks: before data access (read or write). Multiple locks possible: different rows. Nested: hierarchical locks (intention locks). No releases: strict growing.
Lock Acquisition
Transaction needs to read row X:
Acquire shared lock on X (if not already held)
Proceed with read
Lock retained for later release
Transaction needs to write row Y:
Acquire exclusive lock on Y (if not already held)
Proceed with write
Lock retained for later release
Nested Transactions
Subtransactions: acquire locks (contributing to parent). Parent: responsible (releases all child locks at end). Complexity: lock management.
Waiting
Lock unavailable: transaction waits (blocked). Queue: others waiting for same lock. Fair: FIFO typically. Eventually granted: when lock released.
Lock Upgrade
Shared -> exclusive: upgrade. Risky: deadlock (other readers waiting). Some systems: prevent (fail upgrade). Workaround: release shared, request exclusive (risky window).
Duration
Locks held: until shrinking phase. Cost: blocking other transactions. Long growing: reduces concurrency. Optimize: shorter growing phase.
Shrinking Phase
Activities
Release locks: after access complete (no more reads/writes). Sequential: order doesn't matter (for correctness). All locks released: by commit/rollback.
Lock Release
Done accessing data: release locks
Release lock on row 1
Release lock on row 2
(order irrelevant for correctness)
After all locks released: other transactions proceed
Waiting transactions granted (if locks now available)
Cleanup
Rollback: release locks (revert changes). Commit: release locks (persist changes). Both: return to initial lock-free state.
Deadlock Resolution
Victim transaction: aborted (releases all locks). Awakens waiters: can now proceed. Victim retried: application (usually automatic).
Lock Cascade
Release lock on row X: waiting transaction (Z) granted. Z may release locks: cascades. Wave: propagates through queue. Efficiency: system dependent.
Timestamp
Lock point: dividing boundary. Before: growing (acquiring). After: shrinking (releasing). Critical: all operations before lock point (view consistent at point).
Lock Point
Definition
Moment: transaction transitions from growing to shrinking. Before: can acquire locks. After: can only release. Critical: all data accessed must have locks acquired by this point.
Significance
Snapshot: lock point represents consistent view. All reads: reflect state at lock point. Updates: consistent relative to this moment. Serializability: based on lock points (order).
Example
Transaction A:
Acquire lock on account 1
Acquire lock on account 2
[LOCK POINT]
Read account 1: $100
Read account 2: $50
(snapshot at lock point: consistent)
Transaction B (overlapping):
Can update: only if no lock acquired by A
If A acquired lock: B waits
Early Release
Release before commit: risky (dirty read possible if not careful). Standard: hold until commit (strict 2PL). Early: requires careful handling.
Ordering
Lock point order: determines serializability. Earlier point: seen as executing first. Conflict serializable: order derivable from lock points.
Strict Two-Phase Locking
Definition
Exclusive locks held until commit/rollback (not released early). Shared locks: may be released earlier (read-only, safe). Prevents: dirty reads, cascading rollback.
Improvement over Basic 2PL
Basic: might release exclusive lock early. Risk: other transaction reads released data (dirty read if rollback). Strict: prevents (holds until end).
Implementation
Exclusive lock: SET lock_until_end=true
Shared lock: SET lock_until_end=false (can release early)
On COMMIT/ROLLBACK: release all exclusive locks
Deadlock Guarantee
Strict 2PL: reduces deadlock cycles (less overlap). Cascading prevented: transaction isolation stronger. Cost: lock held longer.
Standard Practice
SQL databases: typically implement strict 2PL. Transparent: automatic. Safest: default assumption. Trade: acceptable performance cost.
Cascading Avoidance
Cascading rollback: uncommitted data read. If reader rolls back: cascade. Strict 2PL: prevents (no uncommitted reads). Stronger: deserves the cost.
Rigorous Two-Phase Locking
Definition
All locks (shared and exclusive): held until commit/rollback. Strictest variant. All phantom reads prevented (range locks held). Highest safety.
Difference from Strict
Strict: exclusive until end, shared can release. Rigorous: all until end (more locks held, less concurrency). Trade: absolute safety vs. performance.
Use Cases
Critical transactions: serializability essential. Complex multi-step: safety paramount. High-value operations: bank transfers, medical records. Accept: performance cost.
Implementation Cost
Many locks: higher overhead. Longer hold: reduced concurrency. Benefit: absolute guarantee (no anomalies). Worth: depends on requirements.
Practical Adoption
Rare: explicit (usually implicit in highest isolation). Systems: offer but not default. Performance: prohibitive for high-throughput.
Serializability Guarantee
Theorem
If all transactions follow 2PL, execution serializable. Proof: lock-based prevents conflict cycles. Graph acyclic: serialization order exists.
Conflict Serializable
Conflicts: read-write, write-write between transactions. Precedence graph: nodes=transactions, edges=conflicts. Cycle: non-serializable. 2PL: prevents cycles (graph acyclic).
Example Serialization
Transaction A, B, C: concurrent execution
2PL enforces: lock points order (A before B before C, say)
Serialization: equivalent to sequential A->B->C
Result: same as running A, then B, then C sequentially
Isolation Achievement
Serializability: strongest isolation definition. ACID: consistency guaranteed (serializable). 2PL: achieves (automatically).
Edge Cases
Phantom reads: prevented if range locks used (rigorous). Predicate locks: complex, rarely used. Standard: row-level locks (good enough).
Practical Guarantee
Developers: can assume serializability (with 2PL). Complex transactions: safe (isolation guaranteed). Correctness: ensured by protocol.
Deadlock Handling with 2PL
Deadlock Risk
Circular wait: possible even with 2PL. Example: A locks X, waits for Y; B locks Y, waits for X. Cycle: both blocked. Resolution: required.
Detection Mechanism
Waits-for graph: periodically checked. Cycle detected: deadlock confirmed. Victim selection: which to abort. Fairness: avoid repeated victim.
Resolution Strategies
Victim selection: youngest (lose less work), minimal locks held (less cleanup), random (simple). Abort: release all locks, rollback. Victim retries: transparently (usually).
Prevention Techniques
Lock ordering: consistent order prevents cycles. A always locks X before Y. B same order. No cycles possible (acyclic). Cost: application discipline.
Timeout Alternative
Transaction timeout: after waiting duration, aborted. Simple: no cycle detection. Risk: false aborts (timeout too short). Pragmatic: used in practice.
Example Avoidance
Lock ordering discipline:
Transaction A: acquire A1, then A2
Transaction B: acquire B1, then B2
Ensure: A1 < B1 and A2 < B2 (consistent ordering)
Result: no cycles, deadlock-free
Conservative Two-Phase Locking
Concept
Acquire all locks upfront (predeclare). If any unavailable: abort immediately (not block). Serialization: lock point at start (all locks acquired). Prevents: deadlock (no waiting).
Implementation
Transaction start: declare all locks needed
Acquire lock on row 1: success
Acquire lock on row 2: success
Acquire lock on row 3: conflict, abort
Result: restart (don't wait)
Advantages
Deadlock-free: no waiting (no cycles). Lock point: at start (well-defined). Simple: straightforward logic.
Disadvantages
Over-locking: predeclare all, some unused. Aborts: likely if contention. Restarts: expensive. Practical: overhead high for typical applications.
Use Cases
Batch processing: know data in advance. Simple transactions: few rows. Low contention: predeclare acceptable. Rare: overhead usually prohibitive.
Comparison to Standard
Standard 2PL: acquire as needed (less over-locking). Conservative: upfront (deadlock-free). Trade: flexibility vs. guarantee.
Limitations and Trade-offs
Deadlock Possibility
2PL: permits deadlock (unlike conservative). Detection + resolution: required. Cost: overhead, victim loss (aborted work).
Reduced Concurrency
Locks block: limited parallelism. Contention: severe with many transactions. Throughput: reduced from maximum possible. Bottleneck: lock contention.
Long Transactions
Hold locks long: block others. Impact: cumulative (many transactions waiting). Optimization: keep transactions short (critical).
Cascading Rollback Risk
Basic 2PL: possible (strict variants prevent). Rollback: triggers others. Cascade: can be large. Cost: rework (expensive).
Performance Overhead
Lock management: CPU cost. Lock contention: wait time. Total: significant impact. Tuning: essential for high throughput.
When Alternatives Better
MVCC: high read/write mix, high concurrency. Optimistic locking: low contention. Lock-free: specialized data structures. Choose: based on workload.
2PL in Modern Systems
Explicit Use
SQL Server, Oracle: 2PL often used explicitly. SELECT ... FOR UPDATE: request locks. Transaction scope: automatic lock management.
Implicit Implementation
Transparent: system enforces (usually). Developer: unaware of 2PL (works automatically). Assumption: safe (isolation guaranteed).
MVCC Integration
Modern systems: MVCC + 2PL hybrid. Readers: MVCC (no locks, snapshots). Writers: 2PL (locks, serial). Benefit: read parallelism + write safety.
Performance Tuning
Monitor lock waits: identify contention. Optimize queries: reduce lock hold time. Indexing: reduce rows scanned. Transactions: keep short.
Practical Application
Developers: assume 2PL (correct). OLTP: default workload. OLAP: often disabled (snapshot isolation). Verify: system default (varies).
References
- Eswaran, K. P., et al. "The Notions of Consistency and Predicate Locks in a Database System." Communications of the ACM, vol. 19, no. 11, 1976, pp. 624-633.
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- Bernstein, P. A., Hadzilacos, V., and Goodman, N. "Concurrency Control and Recovery in Database Systems." Addison-Wesley, 1987.