Introduction
Foreign key: column (or columns) in table referencing primary key in another table. Establishes relationship between tables. Maintains referential integrity: every foreign key value either matches existing primary key or is NULL. Foundation for relational database integrity.
Core function: link tables representing relationships. Employee.DeptID foreign key references Department.DeptID. Enforces constraint: every employee must work for existing department (or no department).
Benefits: data consistency, prevents orphan records, enables queries across tables, enforces business rules at database level. Modern databases strictly enforce foreign key constraints.
"Foreign keys are relational model's mechanism for enforcing data integrity. By constraining references, prevent orphaned data, inconsistencies, maintain correctness of relationships throughout database lifetime." -- Relational database theory
Foreign Key Definition
Formal Definition
Foreign key: attribute (or attributes) in relation R1, referencing primary key of relation R2. Domains must match. Example: Employee.Dept_ID (int) matches Department.Dept_ID (int).
Components
Foreign key attribute: columns serving as reference. Referenced table: target table (parent). Referenced column: usually primary key. Referencing table: table with foreign key (child).
Values
Foreign key value: must either match existing value in referenced column or be NULL (optional relationship). Cannot reference non-existent primary keys. Orphaned values invalid.
Notation
Foreign key denoted FK. Example: Employee(EmpID pk, Name, DeptID fk -> Department).
Example
Department table:
DeptID (pk) | DeptName
1 | Sales
2 | Engineering
Employee table:
EmpID | Name | DeptID (fk)
101 | Alice | 1 (valid, references Sales)
102 | Bob | 2 (valid, references Engineering)
103 | Charlie | NULL (valid, no department)
104 | David | 5 (invalid! Department 5 doesn't exist)
Referential Integrity Constraint
Constraint Definition
Referential integrity: every non-null foreign key value must match existing primary key value. Constraint enforced by database. Violation prevented: database rejects invalid inserts/updates.
Enforcement Mechanisms
Database engine checks on INSERT: new foreign key value valid? On UPDATE: modified foreign key valid? On DELETE (parent): child records affected? Strategies differ (cascade, restrict, set null).
Trust Model
Database enforces constraints, applications need not. Cannot accidentally create orphaned references. Data guaranteed consistent. Developers rely on database, not manual validation.
Business Rule Protection
Foreign key enforces business rule: Employee must work for existing Department. Prevents logical errors. Database guards against inconsistencies.
Temporal Consistency
Consistency maintained: at any point in time, referential integrity true. Transactions ensure atomicity: constraint satisfied after each transaction.
Primary Keys vs. Foreign Keys
Primary Key Characteristics
Uniquely identifies row in table. Not null (all values must exist). One per table. Candidate: data-driven. Surrogate: system-generated.
Foreign Key Characteristics
References primary key in another table. Can be null (optional relationship). Can appear multiple times. May be simple (one column) or composite (multiple columns).
Relationship
Foreign key's domain must match referenced primary key's domain. Type, length, meaning consistent. Enables joining: employees with departments via foreign key matching.
Comparison Table
| Aspect | Primary Key | Foreign Key |
|---|---|---|
| Purpose | Unique row identifier | Reference another table |
| Null | Not allowed | Allowed |
| Duplicates | Not allowed | Allowed |
| Count per table | Exactly one | Zero or more |
| Scope | Within table | Across tables |
Declaring Foreign Keys
SQL Syntax
CREATE TABLE Employee (
EmpID INT PRIMARY KEY,
Name VARCHAR(100),
DeptID INT,
FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);
Or inline:
CREATE TABLE Employee (
EmpID INT PRIMARY KEY,
Name VARCHAR(100),
DeptID INT REFERENCES Department(DeptID)
);
Named Foreign Key
Named constraint enables management. Example:
CONSTRAINT fk_emp_dept
FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
Composite Foreign Key
Multiple columns form reference. Example:
FOREIGN KEY (Country, City) REFERENCES Location(Country, City)
Alterations
Add after table creation: ALTER TABLE... ADD CONSTRAINT. Remove: ALTER TABLE... DROP CONSTRAINT. Modify: drop and recreate.
Column Requirements
Foreign key column type must match referenced primary key. Lengths, precision must align. NULL allowed by default unless NOT NULL specified.
Relationship Types Using Foreign Keys
One-to-Many (1:N)
Most common. Foreign key on "many" side. Department-Employee: Employee.DeptID references Department.DeptID. One department, many employees.
One-to-One (1:1)
Foreign key on either side (often child). Employee-Pension: Employee.PensionID references Pension.PensionID. Foreign key unique constraint: ensures one-to-one.
Many-to-Many (M:N)
Junction table with two foreign keys. Student-Course: Enrollment(StudentID fk, CourseID fk). Both columns foreign keys, together primary key.
Self-Referencing
Foreign key references same table. Employee supervises Employee. Employee.ManagerID references Employee.EmpID. Recursive relationship.
Example Junction Table
CREATE TABLE Enrollment (
StudentID INT REFERENCES Student(StudentID),
CourseID INT REFERENCES Course(CourseID),
EnrollmentDate DATE,
Grade CHAR(1),
PRIMARY KEY (StudentID, CourseID)
);
Referential Integrity Violations
Violation Types
Insert violation: insert child with non-existent parent. Example: insert Employee with DeptID=99 (doesn't exist). Rejected by database.
Update violation: change foreign key to non-existent parent. Example: update Employee set DeptID=99. Rejected.
Delete violation: delete parent with existing children. Example: delete Department (employees still assigned). Strategy-dependent.
Error Messages
Database reports constraint violation. Example: "Foreign key constraint fails". Application handles error: display message to user, retry, or escalate.
Prevention
Validate before insert/update: check foreign key exists. Alternatively: let database reject. Most applications use database enforcement.
Detection
Query orphaned records: SELECT * FROM Employee WHERE DeptID NOT IN (SELECT DeptID FROM Department). Find data violations for cleanup.
Cascade Operations
ON DELETE Options
RESTRICT (default): prevent deletion if children exist. CASCADE: delete children automatically. SET NULL: set foreign key to NULL. SET DEFAULT: use default value.
ON UPDATE Options
RESTRICT: prevent parent key update if children reference. CASCADE: update foreign keys automatically. SET NULL: set to NULL. SET DEFAULT: use default.
CASCADE Example
FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
ON DELETE CASCADE
ON UPDATE CASCADE
Delete department -> employees deleted automatically.
Update DeptID -> employee DeptID updated automatically.
Cascade Risks
Automatic deletion dangerous: unintended data loss. Best practice: careful use. Prefer RESTRICT for critical data, CASCADE for non-critical.
Set NULL Strategy
ON DELETE SET NULL: parent deleted, child foreign key becomes NULL. Keeps child record but orphans it. Example: delete department, employees have no department (NULL).
Choosing Strategy
RESTRICT: most conservative, prevent accidents. CASCADE: automatic propagation, use for dependent data. SET NULL: soft delete, preserve history. Business rules determine choice.
Self-Referencing Foreign Keys
Definition
Foreign key references same table. Employee.ManagerID -> Employee.EmpID. Creates hierarchy: managers supervise employees.
Practical Examples
Employee-Manager hierarchy. Category-Subcategory (main category, subcategories). Comment-Reply (comment has replies).
SQL Declaration
CREATE TABLE Employee (
EmpID INT PRIMARY KEY,
Name VARCHAR(100),
ManagerID INT REFERENCES Employee(EmpID)
);
Data Example
EmpID | Name | ManagerID
1 | John | NULL (CEO, no manager)
2 | Alice | 1 (reports to John)
3 | Bob | 1 (reports to John)
4 | Charlie | 2 (reports to Alice)
Queries
Find employees supervised by specific manager: SELECT * FROM Employee WHERE ManagerID = 2. Find employee's chain of command: join Employee to itself recursively.
Cycles
Potential issue: Manager A -> B -> C -> A (cycle). Data model allows, but logically invalid. Validation: prevent cycles via application logic or stored procedures.
Composite Foreign Keys
Definition
Multiple columns form foreign key. Example: (Country, City) -> Location(Country, City). Combined values must match.
SQL Syntax
CREATE TABLE Office (
OfficeID INT PRIMARY KEY,
Name VARCHAR(100),
Country VARCHAR(50),
City VARCHAR(50),
FOREIGN KEY (Country, City) REFERENCES Location(Country, City)
);
Use Cases
Multi-part identifiers: country + city uniquely identify location. Account + transaction ID. Time-based hierarchies: year-month-day.
Alignment Requirement
Column count, types, order must match referenced primary key. If Location primary key (Country, City), foreign key must be (Country, City) in same order.
Example
Location table:
(Country, City) - primary key
('USA', 'NYC')
('USA', 'LA')
('UK', 'London')
Office references (Country, City):
('USA', 'NYC') - valid
('USA', 'LA') - valid
('USA', 'Chicago') - invalid (city doesn't exist in that country)
Orphan Records and Data Quality
Orphan Definition
Child record with non-existent parent. Employee with non-existent DeptID. Result of: disabled constraints, data import errors, application bugs.
Causes
Constraint disabled temporarily. Old data lacks foreign key. Parent deleted without cascade. Batch load bypasses checks. Application error.
Detection
Query orphans: SELECT * FROM Employee WHERE DeptID NOT IN (SELECT DeptID FROM Department). Regular audits: find and report orphaned data.
Resolution
Delete orphans (if data unneeded). Update to valid parent (if reclassifiable). Set to NULL (if optional). Investigate cause: prevent recurrence.
Prevention
Enable constraints always. Don't disable for convenience. Validate imports: check all foreign keys exist. Use transactions: atomicity ensures consistency.
Impact
Orphans complicate queries (unexpected NULL values). Referential integrity broken. Data anomalies. Worst case: garbage data, unreliable conclusions.
Applications and Best Practices
Best Practices
1. Always define foreign keys. 2. Enable constraints at creation, don't disable. 3. Choose cascade strategy carefully. 4. Validate imports/migrations. 5. Audit data regularly. 6. Document relationships.
Application Layer
Applications trust database constraints. No need duplicate validation. Database enforces rules: applications simpler, consistent.
Data Migration
Disable temporarily during bulk loads (performance). Re-enable and validate after: fix violations. Careful process: verify consistency.
Schema Design
Foreign keys integral to relational design. Plan relationships upfront. Model captures business rules. Constraints prevent bad data.
Performance Considerations
Foreign keys add overhead: constraint checking on insert/update. Usually negligible. Enable for data integrity (worth cost).
References
- Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13, no. 6, 1970, pp. 377-387.
- Elmasri, R., and Navathe, S. B. "Fundamentals of Database Systems." Pearson, 7th edition, 2016.
- Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2004.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- O'Neill, E. "Referential Integrity in Relational Databases." Database Design Guide, O'Reilly Media, 2012.