Introduction
Key: attribute(s) uniquely identifying row. Essential: relational model. Primary key: designated unique identifier. Candidate key: alternative unique identifier. Foundation: referential integrity, relationships.
"Keys are the glue of relational databases: uniquely identify rows, enable joins, enforce consistency. Careful choice: foundation of good design." -- Relational model
Primary Key
Definition
Chosen candidate key: designated unique identifier. One per table. Not NULL: always. Indexed: automatically. Used: foreign key references.
Properties
Uniqueness: guaranteed. NOT NULL: required. Immutability: should not change (stable). Minimal: no unnecessary columns.
Example
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
salary DECIMAL(10, 2)
);
Usage
Identify rows: in queries, updates, deletes. Foreign keys: reference by primary key. Relationships: primary key joins tables.
Clustered Index
Physical ordering: by primary key (typically). Performance: ranges on PK fast. Design: consider clustering carefully.
Candidate Keys
Definition
Minimal superkey: uniquely identifies, no redundancy. Multiple per table possible. One becomes primary. Others: alternate unique identifiers.
Example
Employee table candidates:
{employee_id} - numeric identifier (simple)
{ssn} - social security number (natural)
{first_name, last_name, hire_date} - composite (complex)
Choose: employee_id as primary (simple, stable)
Non-Primary Candidates
Can define: UNIQUE constraint. Or: UNIQUE index. Enforced: database ensures uniqueness. Used: for lookups.
Selection Criteria
Simplicity: fewer columns. Stability: immutable. Efficiency: indexed, used frequently. Business logic: natural candidates preferred (when applicable).
Superkeys
Definition
Set of attributes uniquely identifying row. May be redundant: contains more columns than necessary. Example: {employee_id, ssn} (both alone are unique, combined redundant).
Relationship to Keys
Superkey: general concept. Candidate key: minimal superkey (no redundancy). Primary key: chosen candidate.
Redundancy
Superkey may be: unnecessary columns. Design: minimize (choose candidate key). Efficiency: fewer columns faster.
Natural Keys
Definition
Business meaning: real-world identifier. Example: ISBN for books, SSN for people, VIN for vehicles. Stable: subject to change (rare).
Advantages
Meaningful: understandable (not arbitrary ID). Existing: often already unique. Business logic: aligned with domain.
Disadvantages
Stability: may change (SSN, email). Size: possibly large. Multi-column: composite keys more complex.
Risk
Change required: expensive (cascades via foreign keys). Solution: combine with surrogate (use both).
Surrogate Keys
Definition
Artificial identifier: created for uniqueness. Example: auto-increment ID, UUID. No business meaning: purely technical.
Advantages
Simplicity: single column. Stability: never changes. Efficiency: small (fast foreign keys). Flexibility: add real key constraints separately.
Disadvantages
Meaningless: arbitrary numbers. Overhead: extra column (small). Hiding relationships: real identity obscured.
Practical
Common: most modern systems use. Combined: surrogate as primary, natural as UNIQUE constraint. Balance: simplicity + business logic.
Auto-Increment
CREATE TABLE employees (
emp_id INT PRIMARY KEY AUTO_INCREMENT,
ssn VARCHAR(20) UNIQUE, -- natural key (unique but not primary)
name VARCHAR(100)
);
Composite Keys
Definition
Multiple columns: together uniquely identify. Example: {student_id, course_id, semester} uniquely identifies enrollment.
Example
CREATE TABLE enrollments (
student_id INT,
course_id INT,
semester VARCHAR(20),
grade CHAR(1),
PRIMARY KEY (student_id, course_id, semester)
);
Usage
Junction tables: many-to-many relationships. Domain entities: natural composite keys. Complex domain: multiple attributes needed.
Foreign Keys
Reference: all columns of composite. Example: FOREIGN KEY (student_id, course_id, semester). Complex: more joins needed.
Considerations
Complexity: more columns, more overhead. Queries: need all columns in WHERE. Normalization: check 2NF (partial dependencies?).
Uniqueness Enforcement
Primary Key Constraint
Unique: guaranteed (database enforces). Index: created automatically. Performance: fast lookup.
UNIQUE Constraint
CREATE TABLE users (
user_id INT PRIMARY KEY,
email VARCHAR(100) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
UNIQUE INDEX
Alternative: explicit index. Functionally same: ensures uniqueness. More control: index options.
NULL Handling
UNIQUE allows: multiple NULLs (NULL != NULL). Constraint: not enforced for NULLs (varies by DBMS). NOT NULL: required for strict uniqueness.
Violation Handling
Duplicate attempt: error (INSERT/UPDATE fails). Application: must handle (catch exception, retry).
Key Selection Strategies
Decision Process
1. Identify candidates: what uniquely identifies? 2. Choose primary: simplicity, stability, efficiency. 3. Enforce: constraints, indexes. 4. Document: schema comments.
Surrogate vs. Natural
Surrogate: preferred (simple, stable). Natural: add as UNIQUE (preserve business logic). Hybrid: best practice.
Composite vs. Simple
Simple: prefer (faster, simpler joins). Composite: when necessary (junction tables, natural compound identifiers). Evaluate: trade-offs.
Examples by Domain
User: surrogate ID (primary) + email UNIQUE (natural). Product: SKU (natural) + product_id (surrogate). Order: surrogate (order_id) primary.
Practical Examples
Simple Primary Key
CREATE TABLE departments (
dept_id INT PRIMARY KEY AUTO_INCREMENT,
dept_name VARCHAR(50) UNIQUE NOT NULL
);
Natural Key
CREATE TABLE books (
isbn VARCHAR(20) PRIMARY KEY,
title VARCHAR(100),
author VARCHAR(100)
);
Composite Key
CREATE TABLE enrollments (
student_id INT NOT NULL,
course_id INT NOT NULL,
semester VARCHAR(20) NOT NULL,
grade CHAR(1),
PRIMARY KEY (student_id, course_id, semester),
FOREIGN KEY (student_id) REFERENCES students(id),
FOREIGN KEY (course_id) REFERENCES courses(id)
);
Hybrid Approach
CREATE TABLE users (
user_id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(100) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(100)
);
References
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.