Introduction

Foreign key: column (or columns) in table referencing primary key in another table. Establishes relationship between tables. Maintains referential integrity: every foreign key value either matches existing primary key or is NULL. Foundation for relational database integrity.

Core function: link tables representing relationships. Employee.DeptID foreign key references Department.DeptID. Enforces constraint: every employee must work for existing department (or no department).

Benefits: data consistency, prevents orphan records, enables queries across tables, enforces business rules at database level. Modern databases strictly enforce foreign key constraints.

"Foreign keys are relational model's mechanism for enforcing data integrity. By constraining references, prevent orphaned data, inconsistencies, maintain correctness of relationships throughout database lifetime." -- Relational database theory

Foreign Key Definition

Formal Definition

Foreign key: attribute (or attributes) in relation R1, referencing primary key of relation R2. Domains must match. Example: Employee.Dept_ID (int) matches Department.Dept_ID (int).

Components

Foreign key attribute: columns serving as reference. Referenced table: target table (parent). Referenced column: usually primary key. Referencing table: table with foreign key (child).

Values

Foreign key value: must either match existing value in referenced column or be NULL (optional relationship). Cannot reference non-existent primary keys. Orphaned values invalid.

Notation

Foreign key denoted FK. Example: Employee(EmpID pk, Name, DeptID fk -> Department).

Example

Department table:
DeptID (pk) | DeptName
1 | Sales
2 | Engineering

Employee table:
EmpID | Name | DeptID (fk)
101 | Alice | 1 (valid, references Sales)
102 | Bob | 2 (valid, references Engineering)
103 | Charlie | NULL (valid, no department)
104 | David | 5 (invalid! Department 5 doesn't exist)

Referential Integrity Constraint

Constraint Definition

Referential integrity: every non-null foreign key value must match existing primary key value. Constraint enforced by database. Violation prevented: database rejects invalid inserts/updates.

Enforcement Mechanisms

Database engine checks on INSERT: new foreign key value valid? On UPDATE: modified foreign key valid? On DELETE (parent): child records affected? Strategies differ (cascade, restrict, set null).

Trust Model

Database enforces constraints, applications need not. Cannot accidentally create orphaned references. Data guaranteed consistent. Developers rely on database, not manual validation.

Business Rule Protection

Foreign key enforces business rule: Employee must work for existing Department. Prevents logical errors. Database guards against inconsistencies.

Temporal Consistency

Consistency maintained: at any point in time, referential integrity true. Transactions ensure atomicity: constraint satisfied after each transaction.

Primary Keys vs. Foreign Keys

Primary Key Characteristics

Uniquely identifies row in table. Not null (all values must exist). One per table. Candidate: data-driven. Surrogate: system-generated.

Foreign Key Characteristics

References primary key in another table. Can be null (optional relationship). Can appear multiple times. May be simple (one column) or composite (multiple columns).

Relationship

Foreign key's domain must match referenced primary key's domain. Type, length, meaning consistent. Enables joining: employees with departments via foreign key matching.

Comparison Table

Aspect Primary Key Foreign Key
Purpose Unique row identifier Reference another table
Null Not allowed Allowed
Duplicates Not allowed Allowed
Count per table Exactly one Zero or more
Scope Within table Across tables

Declaring Foreign Keys

SQL Syntax

CREATE TABLE Employee (
 EmpID INT PRIMARY KEY,
 Name VARCHAR(100),
 DeptID INT,
 FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);

Or inline:
CREATE TABLE Employee (
 EmpID INT PRIMARY KEY,
 Name VARCHAR(100),
 DeptID INT REFERENCES Department(DeptID)
);

Named Foreign Key

Named constraint enables management. Example:

CONSTRAINT fk_emp_dept
FOREIGN KEY (DeptID) REFERENCES Department(DeptID)

Composite Foreign Key

Multiple columns form reference. Example:

FOREIGN KEY (Country, City) REFERENCES Location(Country, City)

Alterations

Add after table creation: ALTER TABLE... ADD CONSTRAINT. Remove: ALTER TABLE... DROP CONSTRAINT. Modify: drop and recreate.

Column Requirements

Foreign key column type must match referenced primary key. Lengths, precision must align. NULL allowed by default unless NOT NULL specified.

Relationship Types Using Foreign Keys

One-to-Many (1:N)

Most common. Foreign key on "many" side. Department-Employee: Employee.DeptID references Department.DeptID. One department, many employees.

One-to-One (1:1)

Foreign key on either side (often child). Employee-Pension: Employee.PensionID references Pension.PensionID. Foreign key unique constraint: ensures one-to-one.

Many-to-Many (M:N)

Junction table with two foreign keys. Student-Course: Enrollment(StudentID fk, CourseID fk). Both columns foreign keys, together primary key.

Self-Referencing

Foreign key references same table. Employee supervises Employee. Employee.ManagerID references Employee.EmpID. Recursive relationship.

Example Junction Table

CREATE TABLE Enrollment (
 StudentID INT REFERENCES Student(StudentID),
 CourseID INT REFERENCES Course(CourseID),
 EnrollmentDate DATE,
 Grade CHAR(1),
 PRIMARY KEY (StudentID, CourseID)
);

Referential Integrity Violations

Violation Types

Insert violation: insert child with non-existent parent. Example: insert Employee with DeptID=99 (doesn't exist). Rejected by database.

Update violation: change foreign key to non-existent parent. Example: update Employee set DeptID=99. Rejected.

Delete violation: delete parent with existing children. Example: delete Department (employees still assigned). Strategy-dependent.

Error Messages

Database reports constraint violation. Example: "Foreign key constraint fails". Application handles error: display message to user, retry, or escalate.

Prevention

Validate before insert/update: check foreign key exists. Alternatively: let database reject. Most applications use database enforcement.

Detection

Query orphaned records: SELECT * FROM Employee WHERE DeptID NOT IN (SELECT DeptID FROM Department). Find data violations for cleanup.

Cascade Operations

ON DELETE Options

RESTRICT (default): prevent deletion if children exist. CASCADE: delete children automatically. SET NULL: set foreign key to NULL. SET DEFAULT: use default value.

ON UPDATE Options

RESTRICT: prevent parent key update if children reference. CASCADE: update foreign keys automatically. SET NULL: set to NULL. SET DEFAULT: use default.

CASCADE Example

FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
 ON DELETE CASCADE
 ON UPDATE CASCADE

Delete department -> employees deleted automatically.
Update DeptID -> employee DeptID updated automatically.

Cascade Risks

Automatic deletion dangerous: unintended data loss. Best practice: careful use. Prefer RESTRICT for critical data, CASCADE for non-critical.

Set NULL Strategy

ON DELETE SET NULL: parent deleted, child foreign key becomes NULL. Keeps child record but orphans it. Example: delete department, employees have no department (NULL).

Choosing Strategy

RESTRICT: most conservative, prevent accidents. CASCADE: automatic propagation, use for dependent data. SET NULL: soft delete, preserve history. Business rules determine choice.

Self-Referencing Foreign Keys

Definition

Foreign key references same table. Employee.ManagerID -> Employee.EmpID. Creates hierarchy: managers supervise employees.

Practical Examples

Employee-Manager hierarchy. Category-Subcategory (main category, subcategories). Comment-Reply (comment has replies).

SQL Declaration

CREATE TABLE Employee (
 EmpID INT PRIMARY KEY,
 Name VARCHAR(100),
 ManagerID INT REFERENCES Employee(EmpID)
);

Data Example

EmpID | Name | ManagerID
1 | John | NULL (CEO, no manager)
2 | Alice | 1 (reports to John)
3 | Bob | 1 (reports to John)
4 | Charlie | 2 (reports to Alice)

Queries

Find employees supervised by specific manager: SELECT * FROM Employee WHERE ManagerID = 2. Find employee's chain of command: join Employee to itself recursively.

Cycles

Potential issue: Manager A -> B -> C -> A (cycle). Data model allows, but logically invalid. Validation: prevent cycles via application logic or stored procedures.

Composite Foreign Keys

Definition

Multiple columns form foreign key. Example: (Country, City) -> Location(Country, City). Combined values must match.

SQL Syntax

CREATE TABLE Office (
 OfficeID INT PRIMARY KEY,
 Name VARCHAR(100),
 Country VARCHAR(50),
 City VARCHAR(50),
 FOREIGN KEY (Country, City) REFERENCES Location(Country, City)
);

Use Cases

Multi-part identifiers: country + city uniquely identify location. Account + transaction ID. Time-based hierarchies: year-month-day.

Alignment Requirement

Column count, types, order must match referenced primary key. If Location primary key (Country, City), foreign key must be (Country, City) in same order.

Example

Location table:
(Country, City) - primary key
('USA', 'NYC')
('USA', 'LA')
('UK', 'London')

Office references (Country, City):
('USA', 'NYC') - valid
('USA', 'LA') - valid
('USA', 'Chicago') - invalid (city doesn't exist in that country)

Orphan Records and Data Quality

Orphan Definition

Child record with non-existent parent. Employee with non-existent DeptID. Result of: disabled constraints, data import errors, application bugs.

Causes

Constraint disabled temporarily. Old data lacks foreign key. Parent deleted without cascade. Batch load bypasses checks. Application error.

Detection

Query orphans: SELECT * FROM Employee WHERE DeptID NOT IN (SELECT DeptID FROM Department). Regular audits: find and report orphaned data.

Resolution

Delete orphans (if data unneeded). Update to valid parent (if reclassifiable). Set to NULL (if optional). Investigate cause: prevent recurrence.

Prevention

Enable constraints always. Don't disable for convenience. Validate imports: check all foreign keys exist. Use transactions: atomicity ensures consistency.

Impact

Orphans complicate queries (unexpected NULL values). Referential integrity broken. Data anomalies. Worst case: garbage data, unreliable conclusions.

Applications and Best Practices

Best Practices

1. Always define foreign keys. 2. Enable constraints at creation, don't disable. 3. Choose cascade strategy carefully. 4. Validate imports/migrations. 5. Audit data regularly. 6. Document relationships.

Application Layer

Applications trust database constraints. No need duplicate validation. Database enforces rules: applications simpler, consistent.

Data Migration

Disable temporarily during bulk loads (performance). Re-enable and validate after: fix violations. Careful process: verify consistency.

Schema Design

Foreign keys integral to relational design. Plan relationships upfront. Model captures business rules. Constraints prevent bad data.

Performance Considerations

Foreign keys add overhead: constraint checking on insert/update. Usually negligible. Enable for data integrity (worth cost).

References

  • Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13, no. 6, 1970, pp. 377-387.
  • Elmasri, R., and Navathe, S. B. "Fundamentals of Database Systems." Pearson, 7th edition, 2016.
  • Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2004.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
  • O'Neill, E. "Referential Integrity in Relational Databases." Database Design Guide, O'Reilly Media, 2012.