Introduction
Functional dependency (FD): semantic constraint. If A -> B: B's value determined by A's. Foundation: database design, normalization. Identifying FDs: prerequisite for normal forms.
Example: StudentID -> Name (name determined by student ID). Notation: A -> B (A functionally determines B). Keys: special case (superkey -> all columns).
"Functional dependencies express data semantics: which columns determine others. Understanding dependencies: foundation of proper schema design and avoiding anomalies." -- Normalization theory
Definition and Notation
Formal Definition
A -> B (A determines B): if two rows have same A value, they have same B value. Constraint: enforces relationship. Example: employee_id -> name (every employee has unique name).
Notation
A -> B: A functionally determines B. A, B: sets of attributes. {A1, A2} -> {B1, B2}: composite (multiple). Read: "A determines B".
Instance vs. Schema
Instance: specific data (may satisfy FD). Schema: defines FDs (should always hold). Design: identify schema-level FDs (not accidental).
Single vs. Composite
Single: StudentID -> Name
Composite: {StudentID, CourseID} -> Grade
Determining if FD Holds
Check: every pair of rows with same A value has same B value. Counterexample: if one pair differs -> FD doesn't hold. Verification: essential for proper FD identification.
Practical Examples
Employee Table
Attributes: EmployeeID, Name, Salary, DepartmentID, DepartmentName
FDs:
EmployeeID -> Name (name determined by employee)
EmployeeID -> Salary
EmployeeID -> DepartmentID
DepartmentID -> DepartmentName (department name determined by ID)
Non-FD:
Name -/-> Salary (multiple employees with same name, different salary)
Salary -/-> DepartmentID (salary not uniquely determining department)
Student Courses
Attributes: StudentID, CourseID, Semester, Grade
FDs:
{StudentID, CourseID, Semester} -> Grade
(grade determined by student, course, semester combination)
Non-FD:
StudentID -/-> CourseID (student enrolls in many courses)
CourseID -/-> StudentID (course has many students)
Closure and Inference
Attribute Closure
Given attributes A, closure A+ = all attributes determined by A. Algorithm: repeatedly apply FDs until no new attributes added. Useful: determine consequences.
Example Closure
Given FDs:
StudentID -> Name, DepartmentID
DepartmentID -> DepartmentName
{StudentID}+ = {StudentID, Name, DepartmentID, DepartmentName}
FD Closure
Set of all FDs derivable from given FDs. If A -> B in closure, it logically follows. Used: verify implications, design schemas.
Key Identification
K is key if K+ = all attributes and no proper subset K'. Algorithm: compute closures for all attribute combinations.
Armstrong's Axioms
Inference Rules
Axioms: sound (only valid FDs inferred) and complete (all valid FDs derivable). Three basic rules.
Reflexivity
If Y subset X: X -> Y. Trivial FD. Example: {A, B} -> A (always holds).
Augmentation
If X -> Y: XZ -> YZ. Add attributes to both sides. Example: if A -> B, then AC -> BC.
Transitivity
If X -> Y and Y -> Z: X -> Z. Chain relationships. Example: if A -> B and B -> C, then A -> C.
Additional Rules
Union: X -> Y and X -> Z implies X -> YZ. Decomposition: X -> YZ implies X -> Y and X -> Z. Pseudotransitivity: X -> Y and YZ -> W implies XZ -> W.
Trivial Dependencies
Definition
Trivial FD: RHS subset of LHS. Always holds (by reflexivity). Example: {A, B} -> A.
Non-Trivial
At least one RHS attribute not in LHS. Interesting: expresses non-obvious constraint. Focus: design analysis (non-trivial).
Completely Non-Trivial
No LHS attribute in RHS. Strongest form. Example: A -> B (neither is subset).
Use in Design
Trivial: skip (always holds). Non-trivial: analyze (reveals structure). Focus: identify all non-trivial FDs.
Partial Dependencies
Definition
Non-key attribute determined by proper subset of candidate key. Example: {StudentID, CourseID} key, but StudentID -> Name (proper subset determines).
Problem
Anomaly: insertion, update, deletion issues. Data redundancy: name repeated for each course. Solution: decompose (2NF eliminates).
Example
StudentCourse table:
{StudentID, CourseID} -> {Name, Grade}
Partial:
StudentID -> Name (StudentID alone determines name)
CourseID -> Title (CourseID alone determines title)
Transitive Dependencies
Definition
Non-key attribute determined by another non-key attribute. A -> B -> C (transitive chain). Problem: update anomaly, redundancy.
Example
Employee table:
EmployeeID -> DepartmentID
DepartmentID -> DepartmentName
Transitive: EmployeeID -> DepartmentName (through DepartmentID)
Elimination
3NF removes (decompose table). Split: separate tables for each entity. Eliminate chains (transitive).
Canonical Covers
Definition
Minimal FD set: no redundancies, no extraneous attributes. Useful: basis for design decisions. Reduced complexity.
Properties
No redundant FDs (each is essential). No extraneous attributes in FD. Unique (up to reordering).
Computation
Algorithm: eliminate redundancies iteratively. Complex: manual computation tedious. Tools: exist for automation.
Practical Use
Design: understand minimal FDs (reduce complexity). Decomposition: basis for normal forms. Tools: usually computed automatically.
Detecting Anomalies
Insertion Anomaly
Cannot insert without knowing determining values. Example: cannot add department without employee. Root: missing FD decomposition.
Update Anomaly
Change one fact: updates multiple places. Example: change department name, update all employees. Risk: inconsistency (partial updates).
Deletion Anomaly
Delete row: lose unrelated data. Example: delete employee, lose department info. Root: mixed concerns (entity + relationship).
Cause
FDs indicate which attributes belong together. Partial/transitive dependencies: indicate problematic design. Solution: decompose (normal forms).
Role in Normalization
Foundation
Normal forms defined: using FDs. 1NF: atomic values. 2NF: no partial dependencies. 3NF: no transitive dependencies. BCNF: strongest (all non-trivial FDs from keys).
Design Process
1. Identify all FDs (from requirements). 2. Check current schema (which FDs violated?). 3. Decompose (remove violating FDs). 4. Verify (repeat until normal form achieved).
Decomposition Safety
Lossless-join: data preserved. Dependency-preserving: FDs maintained. Goal: achieve both (if possible). Trade-off: may not always possible (rare).
Practical Application
Understand FDs: foundation. Normalization: methodical (FD-driven). Result: robust schemas, minimal anomalies.
References
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.