Introduction

ACID: four properties ensuring reliable database transactions. Atomicity: all-or-nothing. Consistency: valid state preservation. Isolation: concurrent independence. Durability: permanent persistence. Foundation: trusted data systems.

Coined: Jim Gray (1981). Proved: essential for correctness. Trade-off: performance vs. guarantee. Modern systems: balance (tunable consistency). Understanding: prerequisite for system design.

Banking example: transfer money (debit one account, credit another). ACID ensures: both succeed or both fail (not halfway). Data integrity: ACID guarantees.

"ACID properties protect data integrity through transactions. Guarantees critical for trust: all-or-nothing execution, consistency preservation, isolation from interference, permanent durability. Foundation of reliable databases." -- Transaction semantics

Transaction Concept

Definition

Transaction: sequence of operations (reads, writes) treated as single logical unit. Begins: START TRANSACTION. Ends: COMMIT (success) or ROLLBACK (failure). All operations within execute atomically (all or none).

Example Transaction

BEGIN TRANSACTION
 UPDATE accounts SET balance = balance - 100 WHERE id = 1;
 UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Result: both succeed or both fail (transfer atomically)

Transaction States

Active: executing operations. Preparing: checking consistency before commit. Committed: changes persisted. Aborted: rolled back (reverted). Failure: state machine ensures valid transitions.

COMMIT vs. ROLLBACK

COMMIT: finalize changes, make durable. ROLLBACK: undo changes, restore prior state. Choice: application logic decides (success/failure). Both: ensure atomicity.

Savepoints

Intermediate points: SAVEPOINT name. Partial rollback: ROLLBACK TO name. Advanced: fine-grained control. Useful: complex transactions with branches.

Atomicity

Definition

All-or-nothing guarantee. Transaction: entire unit succeeds completely or fails completely. No halfway state: partial completion impossible. Either commit all operations or abort all.

Example Violation

Transfer without atomicity:
Debit account 1: success
System crash
Credit account 2: never executes

Result: money lost (inconsistent state)

Implementation: Write-Ahead Logging

Log operations: before execution (write-ahead). On crash: replay from log. Recovery: undo incomplete transactions. Mechanism: ensures atomicity through durability.

Example with WAL

Write-Ahead Log:
1. Log "debit account 1"
2. Execute debit
3. Log "credit account 2"
4. Execute credit
5. Log "commit"
6. Acknowledge success

Crash during step 4: recover from log, rollback (log doesn't contain commit)

Undo vs. Redo

Undo: restore previous state (rollback). Redo: reapply operations (recovery from crash). WAL: records both (efficient recovery).

Cost of Atomicity

Logging overhead: write log + execute operation. Synchronous: slow (wait for log). Asynchronous: faster (buffer writes), higher crash risk. Balance: performance vs. safety.

Consistency

Definition

Database moves: valid state to valid state. Invariants preserved: no corruption, constraints satisfied. Application responsibility: enforce rules (foreign keys, check constraints). System: prevents crashes from breaking invariants.

Valid States

Constraints: integrity rules. Examples: account balance >= 0, employee must have valid department, total debits = total credits. Consistency: only valid states exist.

Transaction Consistency

Initial state: valid
Intermediate: operations may break constraints
Final state: valid (or rolled back)

Example: temporarily negative balance allowed mid-transaction, but must be non-negative at commit

Constraint Types

Primary key: uniqueness. Foreign key: referential integrity. Check: column value conditions. Unique: distinct values. Default: standard values. System: enforces at commit.

Application Invariants

Business rules: beyond schema constraints. Example: monthly budget enforcement. Database: cannot express (application logic). Developer: responsible for consistency.

Consistency vs. Availability

Strong consistency: always valid (may reduce availability). Eventual: eventually valid (temporary inconsistency allowed). CAP theorem: choose trade-off. ACID: strong consistency.

Isolation

Definition

Concurrent transactions independent: appear serialized. One transaction's writes not visible to others until committed. Prevents: dirty reads, non-repeatable reads, phantom reads.

Isolation Anomalies

Dirty read: read uncommitted data (rolled back). Non-repeatable: value changes mid-transaction. Phantom: new rows appear. Levels: different guarantees. Trade-off: isolation vs. concurrency.

Isolation Levels

Read Uncommitted: no isolation (dirty reads possible). Read Committed: no dirty reads. Repeatable Read: no non-repeatable reads. Serializable: complete isolation. Higher: more protection, less concurrency.

Example Dirty Read

Transaction A: UPDATE balance = 100
Transaction B: READ balance = 100 (uncommitted)
Transaction A: ROLLBACK
Transaction B: has dirty data (phantom balance)

Locking Mechanism

Shared lock: multiple readers. Exclusive lock: sole writer. Conflicts: prevent simultaneous access. Implementation: ensures isolation. Cost: blocking (reduced concurrency).

MVCC Alternative

Multi-Version Concurrency Control: snapshots. Reader: gets snapshot version (no lock). Writer: creates new version. Benefits: high concurrency, no blocking reads. Modern systems: prefer MVCC.

Durability

Definition

Committed data persists: survives hardware failures, crashes, power loss. Once COMMIT acknowledged: data durable. Permanent: system recovers from failure with data intact.

Storage Hierarchy

RAM: volatile (lose on crash). Disk: durable (persistent). SSD: durable (persistent). Network storage: replicated (high durability). Durability: requires non-volatile storage.

Fsync Guarantee

Write to disk buffer: may be lost (buffer lost on crash)
Fsync to persistent storage: acknowledged durable
COMMIT requires: fsync to log (expensive but necessary)

Durability Costs

Synchronous: fsync after each commit (slow, ~10ms per commit, slow latency). Asynchronous: batch fsync (faster, risk of data loss if crash). Balance: performance vs. safety.

Replication for Durability

Copy data to multiple servers. Failure: one server lost, others survive. Quorum: majority confirmation before commit. Higher durability: multiple geographically distributed copies.

Recovery After Failure

System crash: uncommitted transactions lost (expected). Committed: recovered from log/replicas. Recovery: automatic, system restarts. Consistency: maintained (ACID).

ACID Combined

Money Transfer Example

Initial: Account A = $100, Account B = $50

Transfer $20 from A to B:

BEGIN TRANSACTION
 UPDATE accounts SET balance = balance - 20 WHERE id = A;
 UPDATE accounts SET balance = balance + 20 WHERE id = B;
COMMIT;

Atomicity: both operations succeed or both fail
Consistency: total money preserved ($150), no account negative
Isolation: concurrent transfers don't interfere
Durability: after commit, transfer survives failures

Result: Account A = $80, Account B = $70 (always consistent)

Failure Scenarios

Debit succeeds, system crashes before credit: ROLLBACK (atomicity). Application constraint violated: ROLLBACK (consistency). Concurrent access: locking (isolation). Power loss after commit: recovery (durability).

Confidence Guarantee

ACID ensures: data integrity, no surprises, predictable behavior. Trust: applications can rely on guarantees. Building blocks: complex operations built from atomic transactions.

Trade-offs Summary

Property Benefit Cost
Atomicity Prevent partial updates Logging overhead
Consistency Data integrity guaranteed Constraint checking
Isolation Concurrent safety Locking, reduced concurrency
Durability Data permanence Fsync latency

Implementation Mechanisms

Write-Ahead Logging

Log entry written: before operation executes. Undo log: old values (rollback). Redo log: new values (recovery). On crash: scan log, redo committed, undo uncommitted.

Locking Protocol

Acquire locks: before accessing data. Release locks: after operation. Deadlock prevention: careful ordering. Cost: potential blocking (decreased concurrency).

Timestamp Ordering

Transaction timestamp: order assigned. Reads/writes: checked against timestamps. Conflicts: detected, aborted. Optimistic: assume no conflicts, detect at commit.

MVCC (Multi-Version Concurrency Control)

Each transaction: sees consistent snapshot. Multiple versions: reader gets version at start time. Writer: creates new version. No blocking: high concurrency. Cost: version storage.

Two-Phase Commit

Distributed: multiple databases. Phase 1: prepare (can commit?). Phase 2: commit or abort all. Ensures: consistency across systems. Cost: complex, slow (requires coordinator).

Violation Scenarios

Without Atomicity

Crash mid-transaction: partial effects remain. Example: debit succeeds, crash before credit. Data corruption: money lost (unbalanced).

Without Consistency

Constraints violated: invalid states possible. Example: negative balance allowed. Corrupt data: breaks application logic. Recovery: manual fix (expensive).

Without Isolation

Dirty read: read uncommitted data. Non-repeatable read: value changes mid-transaction. Phantom: new rows appear. Race conditions: unpredictable behavior. Data races: corruption.

Without Durability

Power loss: committed data lost. Recovery: incomplete (missing updates). Data loss: permanent (unrecoverable). Trust: destroyed.

Cascading Failures

Missing ACID: multiple failures cascade. Atomicity failure: isolation fails (dirty reads). Isolation failure: consistency fails (invalid states). Durability: foundation (if missing, all else fails).

Performance Trade-offs

Synchronous Safety

Fsync after commit: safe (durable). Latency: ~10ms per commit. Throughput: ~100 commits/sec. Cost: significant.

Asynchronous Safety

Buffer writes, fsync batched: faster (1000+ commits/sec). Risk: crash loss (last few commits). Trade: performance for safety.

Isolation Levels

Serializable: complete safety, lowest throughput. Read Committed: moderate safety, higher throughput. Higher level: higher cost (locking, validation). Choice: balance requirements.

Locking vs. MVCC

Locking: simple, blocking (reduces concurrency). MVCC: complex, non-blocking (high concurrency). Memory: MVCC needs version storage. Choice: workload-dependent.

Replication

Local durability: single server (fast). Replicated: multiple servers (slower, higher durability). Quorum: balance (majority confirmation).

Alternatives and Relaxations

Eventual Consistency

Relax consistency: temporary inconsistency allowed. Benefits: high availability, scalability. Cost: complex application logic. Use: non-critical data (caches, analytics).

BASE (Basically Available, Soft state, Eventual consistency)

Opposite: ACID. Flexible: high performance. Suitable: distributed systems, scalability critical. Cost: complexity, eventual (not immediate).

Weaker Isolation Levels

Read Uncommitted: no isolation. Read Committed: some isolation. Repeatable Read: more isolation. Serializable: full isolation. Lower level: faster but risk anomalies.

Sagas (Distributed Transactions)

Multi-step: coordinated across systems. Compensating: undo logic if failure. Alternative: eventual consistency (non-atomic). Complexity: managing rollback.

When to Relax ACID

Critical data: keep ACID (banking, medical). Non-critical: relax (caches, recommendations). Scaling: eventual consistency acceptable. Evaluate: requirements first.

Practical Considerations

SQL Standards

ACID: SQL standard (assumed). Most SQL databases: full ACID (PostgreSQL, Oracle, SQL Server). Some: configurable (MySQL with InnoDB). Verify: system choice.

Connection Management

Transaction scope: connection-based. Multiple connections: separate transactions (concurrent). Connection pooling: reuse (reset transaction state).

Application Logic

Transaction boundaries: application defines (BEGIN, COMMIT). Logic: error handling (on what conditions to commit/rollback). Design: careful (complex transactions risky).

Monitoring

Slow transactions: block resources (locks, log). Monitor: long transactions, rollback frequency. Optimize: shorter transactions, faster execution.

Testing

Failure scenarios: simulate crashes, network failures. Verify: recovery works. Atomicity: test partial failures. Consistency: validate invariants. Isolation: concurrent access testing.

References

  • Gray, J. "The Transaction Concept: Virtues and Limitations." Proceedings of VLDB, 1981.
  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
  • Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
  • Kleppmann, M. "Designing Data-Intensive Applications." O'Reilly Media, 2017.