Introduction
ACID: four properties ensuring reliable database transactions. Atomicity: all-or-nothing. Consistency: valid state preservation. Isolation: concurrent independence. Durability: permanent persistence. Foundation: trusted data systems.
Coined: Jim Gray (1981). Proved: essential for correctness. Trade-off: performance vs. guarantee. Modern systems: balance (tunable consistency). Understanding: prerequisite for system design.
Banking example: transfer money (debit one account, credit another). ACID ensures: both succeed or both fail (not halfway). Data integrity: ACID guarantees.
"ACID properties protect data integrity through transactions. Guarantees critical for trust: all-or-nothing execution, consistency preservation, isolation from interference, permanent durability. Foundation of reliable databases." -- Transaction semantics
Transaction Concept
Definition
Transaction: sequence of operations (reads, writes) treated as single logical unit. Begins: START TRANSACTION. Ends: COMMIT (success) or ROLLBACK (failure). All operations within execute atomically (all or none).
Example Transaction
BEGIN TRANSACTION
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Result: both succeed or both fail (transfer atomically)
Transaction States
Active: executing operations. Preparing: checking consistency before commit. Committed: changes persisted. Aborted: rolled back (reverted). Failure: state machine ensures valid transitions.
COMMIT vs. ROLLBACK
COMMIT: finalize changes, make durable. ROLLBACK: undo changes, restore prior state. Choice: application logic decides (success/failure). Both: ensure atomicity.
Savepoints
Intermediate points: SAVEPOINT name. Partial rollback: ROLLBACK TO name. Advanced: fine-grained control. Useful: complex transactions with branches.
Atomicity
Definition
All-or-nothing guarantee. Transaction: entire unit succeeds completely or fails completely. No halfway state: partial completion impossible. Either commit all operations or abort all.
Example Violation
Transfer without atomicity:
Debit account 1: success
System crash
Credit account 2: never executes
Result: money lost (inconsistent state)
Implementation: Write-Ahead Logging
Log operations: before execution (write-ahead). On crash: replay from log. Recovery: undo incomplete transactions. Mechanism: ensures atomicity through durability.
Example with WAL
Write-Ahead Log:
1. Log "debit account 1"
2. Execute debit
3. Log "credit account 2"
4. Execute credit
5. Log "commit"
6. Acknowledge success
Crash during step 4: recover from log, rollback (log doesn't contain commit)
Undo vs. Redo
Undo: restore previous state (rollback). Redo: reapply operations (recovery from crash). WAL: records both (efficient recovery).
Cost of Atomicity
Logging overhead: write log + execute operation. Synchronous: slow (wait for log). Asynchronous: faster (buffer writes), higher crash risk. Balance: performance vs. safety.
Consistency
Definition
Database moves: valid state to valid state. Invariants preserved: no corruption, constraints satisfied. Application responsibility: enforce rules (foreign keys, check constraints). System: prevents crashes from breaking invariants.
Valid States
Constraints: integrity rules. Examples: account balance >= 0, employee must have valid department, total debits = total credits. Consistency: only valid states exist.
Transaction Consistency
Initial state: valid
Intermediate: operations may break constraints
Final state: valid (or rolled back)
Example: temporarily negative balance allowed mid-transaction, but must be non-negative at commit
Constraint Types
Primary key: uniqueness. Foreign key: referential integrity. Check: column value conditions. Unique: distinct values. Default: standard values. System: enforces at commit.
Application Invariants
Business rules: beyond schema constraints. Example: monthly budget enforcement. Database: cannot express (application logic). Developer: responsible for consistency.
Consistency vs. Availability
Strong consistency: always valid (may reduce availability). Eventual: eventually valid (temporary inconsistency allowed). CAP theorem: choose trade-off. ACID: strong consistency.
Isolation
Definition
Concurrent transactions independent: appear serialized. One transaction's writes not visible to others until committed. Prevents: dirty reads, non-repeatable reads, phantom reads.
Isolation Anomalies
Dirty read: read uncommitted data (rolled back). Non-repeatable: value changes mid-transaction. Phantom: new rows appear. Levels: different guarantees. Trade-off: isolation vs. concurrency.
Isolation Levels
Read Uncommitted: no isolation (dirty reads possible). Read Committed: no dirty reads. Repeatable Read: no non-repeatable reads. Serializable: complete isolation. Higher: more protection, less concurrency.
Example Dirty Read
Transaction A: UPDATE balance = 100
Transaction B: READ balance = 100 (uncommitted)
Transaction A: ROLLBACK
Transaction B: has dirty data (phantom balance)
Locking Mechanism
Shared lock: multiple readers. Exclusive lock: sole writer. Conflicts: prevent simultaneous access. Implementation: ensures isolation. Cost: blocking (reduced concurrency).
MVCC Alternative
Multi-Version Concurrency Control: snapshots. Reader: gets snapshot version (no lock). Writer: creates new version. Benefits: high concurrency, no blocking reads. Modern systems: prefer MVCC.
Durability
Definition
Committed data persists: survives hardware failures, crashes, power loss. Once COMMIT acknowledged: data durable. Permanent: system recovers from failure with data intact.
Storage Hierarchy
RAM: volatile (lose on crash). Disk: durable (persistent). SSD: durable (persistent). Network storage: replicated (high durability). Durability: requires non-volatile storage.
Fsync Guarantee
Write to disk buffer: may be lost (buffer lost on crash)
Fsync to persistent storage: acknowledged durable
COMMIT requires: fsync to log (expensive but necessary)
Durability Costs
Synchronous: fsync after each commit (slow, ~10ms per commit, slow latency). Asynchronous: batch fsync (faster, risk of data loss if crash). Balance: performance vs. safety.
Replication for Durability
Copy data to multiple servers. Failure: one server lost, others survive. Quorum: majority confirmation before commit. Higher durability: multiple geographically distributed copies.
Recovery After Failure
System crash: uncommitted transactions lost (expected). Committed: recovered from log/replicas. Recovery: automatic, system restarts. Consistency: maintained (ACID).
ACID Combined
Money Transfer Example
Initial: Account A = $100, Account B = $50
Transfer $20 from A to B:
BEGIN TRANSACTION
UPDATE accounts SET balance = balance - 20 WHERE id = A;
UPDATE accounts SET balance = balance + 20 WHERE id = B;
COMMIT;
Atomicity: both operations succeed or both fail
Consistency: total money preserved ($150), no account negative
Isolation: concurrent transfers don't interfere
Durability: after commit, transfer survives failures
Result: Account A = $80, Account B = $70 (always consistent)
Failure Scenarios
Debit succeeds, system crashes before credit: ROLLBACK (atomicity). Application constraint violated: ROLLBACK (consistency). Concurrent access: locking (isolation). Power loss after commit: recovery (durability).
Confidence Guarantee
ACID ensures: data integrity, no surprises, predictable behavior. Trust: applications can rely on guarantees. Building blocks: complex operations built from atomic transactions.
Trade-offs Summary
| Property | Benefit | Cost |
|---|---|---|
| Atomicity | Prevent partial updates | Logging overhead |
| Consistency | Data integrity guaranteed | Constraint checking |
| Isolation | Concurrent safety | Locking, reduced concurrency |
| Durability | Data permanence | Fsync latency |
Implementation Mechanisms
Write-Ahead Logging
Log entry written: before operation executes. Undo log: old values (rollback). Redo log: new values (recovery). On crash: scan log, redo committed, undo uncommitted.
Locking Protocol
Acquire locks: before accessing data. Release locks: after operation. Deadlock prevention: careful ordering. Cost: potential blocking (decreased concurrency).
Timestamp Ordering
Transaction timestamp: order assigned. Reads/writes: checked against timestamps. Conflicts: detected, aborted. Optimistic: assume no conflicts, detect at commit.
MVCC (Multi-Version Concurrency Control)
Each transaction: sees consistent snapshot. Multiple versions: reader gets version at start time. Writer: creates new version. No blocking: high concurrency. Cost: version storage.
Two-Phase Commit
Distributed: multiple databases. Phase 1: prepare (can commit?). Phase 2: commit or abort all. Ensures: consistency across systems. Cost: complex, slow (requires coordinator).
Violation Scenarios
Without Atomicity
Crash mid-transaction: partial effects remain. Example: debit succeeds, crash before credit. Data corruption: money lost (unbalanced).
Without Consistency
Constraints violated: invalid states possible. Example: negative balance allowed. Corrupt data: breaks application logic. Recovery: manual fix (expensive).
Without Isolation
Dirty read: read uncommitted data. Non-repeatable read: value changes mid-transaction. Phantom: new rows appear. Race conditions: unpredictable behavior. Data races: corruption.
Without Durability
Power loss: committed data lost. Recovery: incomplete (missing updates). Data loss: permanent (unrecoverable). Trust: destroyed.
Cascading Failures
Missing ACID: multiple failures cascade. Atomicity failure: isolation fails (dirty reads). Isolation failure: consistency fails (invalid states). Durability: foundation (if missing, all else fails).
Performance Trade-offs
Synchronous Safety
Fsync after commit: safe (durable). Latency: ~10ms per commit. Throughput: ~100 commits/sec. Cost: significant.
Asynchronous Safety
Buffer writes, fsync batched: faster (1000+ commits/sec). Risk: crash loss (last few commits). Trade: performance for safety.
Isolation Levels
Serializable: complete safety, lowest throughput. Read Committed: moderate safety, higher throughput. Higher level: higher cost (locking, validation). Choice: balance requirements.
Locking vs. MVCC
Locking: simple, blocking (reduces concurrency). MVCC: complex, non-blocking (high concurrency). Memory: MVCC needs version storage. Choice: workload-dependent.
Replication
Local durability: single server (fast). Replicated: multiple servers (slower, higher durability). Quorum: balance (majority confirmation).
Alternatives and Relaxations
Eventual Consistency
Relax consistency: temporary inconsistency allowed. Benefits: high availability, scalability. Cost: complex application logic. Use: non-critical data (caches, analytics).
BASE (Basically Available, Soft state, Eventual consistency)
Opposite: ACID. Flexible: high performance. Suitable: distributed systems, scalability critical. Cost: complexity, eventual (not immediate).
Weaker Isolation Levels
Read Uncommitted: no isolation. Read Committed: some isolation. Repeatable Read: more isolation. Serializable: full isolation. Lower level: faster but risk anomalies.
Sagas (Distributed Transactions)
Multi-step: coordinated across systems. Compensating: undo logic if failure. Alternative: eventual consistency (non-atomic). Complexity: managing rollback.
When to Relax ACID
Critical data: keep ACID (banking, medical). Non-critical: relax (caches, recommendations). Scaling: eventual consistency acceptable. Evaluate: requirements first.
Practical Considerations
SQL Standards
ACID: SQL standard (assumed). Most SQL databases: full ACID (PostgreSQL, Oracle, SQL Server). Some: configurable (MySQL with InnoDB). Verify: system choice.
Connection Management
Transaction scope: connection-based. Multiple connections: separate transactions (concurrent). Connection pooling: reuse (reset transaction state).
Application Logic
Transaction boundaries: application defines (BEGIN, COMMIT). Logic: error handling (on what conditions to commit/rollback). Design: careful (complex transactions risky).
Monitoring
Slow transactions: block resources (locks, log). Monitor: long transactions, rollback frequency. Optimize: shorter transactions, faster execution.
Testing
Failure scenarios: simulate crashes, network failures. Verify: recovery works. Atomicity: test partial failures. Consistency: validate invariants. Isolation: concurrent access testing.
References
- Gray, J. "The Transaction Concept: Virtues and Limitations." Proceedings of VLDB, 1981.
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- Kleppmann, M. "Designing Data-Intensive Applications." O'Reilly Media, 2017.