Functional Dependencies - Database Systems

Introduction

Functional dependency (FD): semantic constraint. If A -> B: B's value determined by A's. Foundation: database design, normalization. Identifying FDs: prerequisite for normal forms.

Example: StudentID -> Name (name determined by student ID). Notation: A -> B (A functionally determines B). Keys: special case (superkey -> all columns).

"Functional dependencies express data semantics: which columns determine others. Understanding dependencies: foundation of proper schema design and avoiding anomalies." -- Normalization theory

Definition and Notation

Formal Definition

A -> B (A determines B): if two rows have same A value, they have same B value. Constraint: enforces relationship. Example: employee_id -> name (every employee has unique name).

Notation

A -> B: A functionally determines B. A, B: sets of attributes. {A1, A2} -> {B1, B2}: composite (multiple). Read: "A determines B".

Instance vs. Schema

Instance: specific data (may satisfy FD). Schema: defines FDs (should always hold). Design: identify schema-level FDs (not accidental).

Single vs. Composite

Single: StudentID -> Name
Composite: {StudentID, CourseID} -> Grade

Determining if FD Holds

Check: every pair of rows with same A value has same B value. Counterexample: if one pair differs -> FD doesn't hold. Verification: essential for proper FD identification.

Practical Examples

Employee Table

Attributes: EmployeeID, Name, Salary, DepartmentID, DepartmentName

FDs:
EmployeeID -> Name (name determined by employee)
EmployeeID -> Salary
EmployeeID -> DepartmentID
DepartmentID -> DepartmentName (department name determined by ID)

Non-FD:
Name -/-> Salary (multiple employees with same name, different salary)
Salary -/-> DepartmentID (salary not uniquely determining department)

Student Courses

Attributes: StudentID, CourseID, Semester, Grade

FDs:
{StudentID, CourseID, Semester} -> Grade
(grade determined by student, course, semester combination)

Non-FD:
StudentID -/-> CourseID (student enrolls in many courses)
CourseID -/-> StudentID (course has many students)

Closure and Inference

Attribute Closure

Given attributes A, closure A+ = all attributes determined by A. Algorithm: repeatedly apply FDs until no new attributes added. Useful: determine consequences.

Example Closure

Given FDs:
StudentID -> Name, DepartmentID
DepartmentID -> DepartmentName

{StudentID}+ = {StudentID, Name, DepartmentID, DepartmentName}

FD Closure

Set of all FDs derivable from given FDs. If A -> B in closure, it logically follows. Used: verify implications, design schemas.

Key Identification

K is key if K+ = all attributes and no proper subset K'. Algorithm: compute closures for all attribute combinations.

Armstrong's Axioms

Inference Rules

Axioms: sound (only valid FDs inferred) and complete (all valid FDs derivable). Three basic rules.

Reflexivity

If Y subset X: X -> Y. Trivial FD. Example: {A, B} -> A (always holds).

Augmentation

If X -> Y: XZ -> YZ. Add attributes to both sides. Example: if A -> B, then AC -> BC.

Transitivity

If X -> Y and Y -> Z: X -> Z. Chain relationships. Example: if A -> B and B -> C, then A -> C.

Additional Rules

Union: X -> Y and X -> Z implies X -> YZ. Decomposition: X -> YZ implies X -> Y and X -> Z. Pseudotransitivity: X -> Y and YZ -> W implies XZ -> W.

Trivial Dependencies

Definition

Trivial FD: RHS subset of LHS. Always holds (by reflexivity). Example: {A, B} -> A.

Non-Trivial

At least one RHS attribute not in LHS. Interesting: expresses non-obvious constraint. Focus: design analysis (non-trivial).

Completely Non-Trivial

No LHS attribute in RHS. Strongest form. Example: A -> B (neither is subset).

Use in Design

Trivial: skip (always holds). Non-trivial: analyze (reveals structure). Focus: identify all non-trivial FDs.

Partial Dependencies

Definition

Non-key attribute determined by proper subset of candidate key. Example: {StudentID, CourseID} key, but StudentID -> Name (proper subset determines).

Problem

Anomaly: insertion, update, deletion issues. Data redundancy: name repeated for each course. Solution: decompose (2NF eliminates).

Example

StudentCourse table:
{StudentID, CourseID} -> {Name, Grade}

Partial:
StudentID -> Name (StudentID alone determines name)
CourseID -> Title (CourseID alone determines title)

Transitive Dependencies

Definition

Non-key attribute determined by another non-key attribute. A -> B -> C (transitive chain). Problem: update anomaly, redundancy.

Example

Employee table:
EmployeeID -> DepartmentID
DepartmentID -> DepartmentName

Transitive: EmployeeID -> DepartmentName (through DepartmentID)

Elimination

3NF removes (decompose table). Split: separate tables for each entity. Eliminate chains (transitive).

Canonical Covers

Definition

Minimal FD set: no redundancies, no extraneous attributes. Useful: basis for design decisions. Reduced complexity.

Properties

No redundant FDs (each is essential). No extraneous attributes in FD. Unique (up to reordering).

Computation

Algorithm: eliminate redundancies iteratively. Complex: manual computation tedious. Tools: exist for automation.

Practical Use

Design: understand minimal FDs (reduce complexity). Decomposition: basis for normal forms. Tools: usually computed automatically.

Detecting Anomalies

Insertion Anomaly

Cannot insert without knowing determining values. Example: cannot add department without employee. Root: missing FD decomposition.

Update Anomaly

Change one fact: updates multiple places. Example: change department name, update all employees. Risk: inconsistency (partial updates).

Deletion Anomaly

Delete row: lose unrelated data. Example: delete employee, lose department info. Root: mixed concerns (entity + relationship).

Cause

FDs indicate which attributes belong together. Partial/transitive dependencies: indicate problematic design. Solution: decompose (normal forms).

Role in Normalization

Foundation

Normal forms defined: using FDs. 1NF: atomic values. 2NF: no partial dependencies. 3NF: no transitive dependencies. BCNF: strongest (all non-trivial FDs from keys).

Design Process

1. Identify all FDs (from requirements). 2. Check current schema (which FDs violated?). 3. Decompose (remove violating FDs). 4. Verify (repeat until normal form achieved).

Decomposition Safety

Lossless-join: data preserved. Dependency-preserving: FDs maintained. Goal: achieve both (if possible). Trade-off: may not always possible (rare).

Practical Application

Understand FDs: foundation. Normalization: methodical (FD-driven). Result: robust schemas, minimal anomalies.

References

Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
Garcia-Molina, H., Ullman, J. D., and Widom, J. "Database Systems: The Complete Book." Pearson, 2nd edition, 2008.
Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.