Decomposition | What's Your IQ

Introduction

Decomposition: split relation into multiple relations. Goal: eliminate anomalies (achieve normal form). Requirements: lossless-join (preserve data), dependency-preserving (preserve constraints).

Trade-off: often choose one (cannot always achieve both). 3NF: usually achieves both. BCNF: may lose dependencies. Strategy: depends on priorities.

"Decomposition separates concerns: each table one entity. Careful: preserve data (lossless) and constraints (dependency). Balance: completeness vs. constraints." -- Normalization

Decomposition Concept

Idea

Table R with problem FDs: split into R1, R2, ... Columns partitioned: each R-i contains subset. Goal: eliminate violating FDs.

Example

StudentCourse (StudentID, CourseID, Professor)
Key: {StudentID, CourseID}
Problem: Professor -> CourseID (non-key determines)

Decompose:
 StudentCourse2 (StudentID, CourseID)
 ProfessorCourse (Professor, CourseID)

Implications

Data: split across tables. Queries: require joins. Trade: simplicity for correctness.

Necessity

Anomalies indicate: improper design. Decomposition: fix (methodical). Required: for normal forms.

Lossless-Join Decomposition

Definition

Rejoin decomposed tables: recover original. No data loss. Essential: correctness (preserve information).

Test

For R decomposed into R1, R2: lossless if common columns (R1 ∩ R2) form key of at least one. Condition: necessary and sufficient (for two-way).

Example Lossless

StudentCourse (StudentID, CourseID, Grade)
Decompose:
 Student (StudentID, CourseID) -- Key: {StudentID, CourseID}
 Grade (StudentID, CourseID, Grade) -- Key: {StudentID, CourseID}

Common: {StudentID, CourseID} (key of both)
Lossless: can rejoin perfectly

Example Lossy

Student (StudentID, Name, DepartmentID)
Decompose:
 S1 (StudentID, Name)
 S2 (Name, DepartmentID)

Common: Name (not key of either)
Lossy: multiple students same name -> spurious rows on join

Verification

Chase algorithm: formal method. Or: test common columns (key of at least one). Practical: use systematic approach.

Lossy Decomposition

Definition

Rejoin: introduces spurious rows. Data corrupted: incorrect. Unacceptable: never use.

Risk

Unnoticed: difficult to detect. Silent corruption: worse than failure. Always verify: lossless.

Avoidance

Algorithms guarantee: lossless (if followed). Manual: verify common columns.

Dependency-Preserving Decomposition

Definition

All FDs: enforceable within decomposed tables (not across). No multi-table checks. Efficient: enforcement local.

Check

Closure of FDs in decomposed: equals closure of original? If yes: preserving. If no: dependency lost.

Trade-off

Preserving: desirable (enforcement). Non-preserving: may be necessary (for BCNF). Choose: based on priorities.

Loss Implication

Application enforces: lost constraints (code needed). Risk: bugs (constraints not checked). Maintenance: harder.

Decomposition Algorithms

General Approach

1. Identify problem FD (violates normal form). 2. Decompose: split (FD in one, not spanning). 3. Repeat: for each decomposed table. 4. Verify: lossless and dependency-preserving (if possible).

Strategy

Bottom-up: start with 1NF, incrementally normalize. Or top-down: given target form, decompose. Either: reaches goal.

3NF Decomposition Algorithm

Steps

1. Find minimal cover of FDs. 2. For each FD X -> Y: create table {X, Y}. 3. If no table contains key: add table with key. 4. Remove redundant tables (if subset of another). 5. Result: 3NF, dependency-preserving, usually lossless.

Property

Guaranteed: 3NF, dependency-preserving. Lossless: usually (not always). Practical: good algorithm.

BCNF Decomposition Algorithm

Steps

1. Find FD X -> Y where X not superkey. 2. Decompose: {X, Y} and {X, R-Y}. 3. Repeat: recursively. 4. Stop: all tables BCNF.

Property

Guaranteed: BCNF. Lossless: always. Dependency-preserving: not always (may lose FDs). Trade-off: acceptable.

When Dependency Loss

Rare: specific FD patterns. If occurs: accept (or choose 3NF instead).

Verification Methods

Lossless Verification

Check: common columns (R1 ∩ R2) form key of at least one. For complex (3+ tables): chase algorithm (manual tedious, use tool).

Dependency-Preserving Verification

Closure test: compute FD closure in decomposed vs. original. Match: preserving. Otherwise: lost.

Normal Form Verification

Check each table: does it meet target form? All table check: overall normalization achieved.

Testing

Instance-level: create data, test for anomalies. Structural: verify FD compliance. Comprehensive: both approaches.

Practical Examples

2NF Example

StudentCourse (StudentID, CourseID, Name, Grade)
Problem: StudentID -> Name (partial dependency)

Decompose:
 Student (StudentID, Name)
 Enrollment (StudentID, CourseID, Grade)

Result: 2NF (no partial dependencies)

3NF Example

Employee (EmployeeID, Name, DepartmentID, DepartmentName)
Problem: DepartmentID -> DepartmentName (transitive)

Decompose:
 Employee (EmployeeID, Name, DepartmentID)
 Department (DepartmentID, DepartmentName)

Result: 3NF (no transitive dependencies)

Complex Decomposition

StudentCourse (StudentID, CourseID, Professor, Time)
Problems: Multiple FDs

Algorithm:
1. Professor -> Time (not from key)
2. Decompose into:
 StudentCourse (StudentID, CourseID, Professor)
 ProfessorSchedule (Professor, Time)

Repeat: verify both in normal form (BCNF achieved)

References

Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.