Introduction
Key: attribute(s) uniquely identifying row. Essential: relational model. Primary key: designated unique identifier. Candidate key: alternative unique identifier. Foundation: referential integrity, relationships.
"Keys are the glue of relational databases: uniquely identify rows, enable joins, enforce consistency. Careful choice: foundation of good design." -- Relational model
Primary Key
Definition
Chosen candidate key: designated unique identifier. One per table. Not NULL: always. Indexed: automatically. Used: foreign key references.
Properties
Uniqueness: guaranteed. NOT NULL: required. Immutability: should not change (stable). Minimal: no unnecessary columns.
Example
CREATE TABLE employees ( employee_id INT PRIMARY KEY, name VARCHAR(100), salary DECIMAL(10, 2));Usage
Identify rows: in queries, updates, deletes. Foreign keys: reference by primary key. Relationships: primary key joins tables.
Clustered Index
Physical ordering: by primary key (typically). Performance: ranges on PK fast. Design: consider clustering carefully.
Candidate Keys
Definition
Minimal superkey: uniquely identifies, no redundancy. Multiple per table possible. One becomes primary. Others: alternate unique identifiers.
Example
Employee table candidates: {employee_id} - numeric identifier (simple) {ssn} - social security number (natural) {first_name, last_name, hire_date} - composite (complex)Choose: employee_id as primary (simple, stable)Non-Primary Candidates
Can define: UNIQUE constraint. Or: UNIQUE index. Enforced: database ensures uniqueness. Used: for lookups.
Selection Criteria
Simplicity: fewer columns. Stability: immutable. Efficiency: indexed, used frequently. Business logic: natural candidates preferred (when applicable).
Superkeys
Definition
Set of attributes uniquely identifying row. May be redundant: contains more columns than necessary. Example: {employee_id, ssn} (both alone are unique, combined redundant).
Relationship to Keys
Superkey: general concept. Candidate key: minimal superkey (no redundancy). Primary key: chosen candidate.
Redundancy
Superkey may be: unnecessary columns. Design: minimize (choose candidate key). Efficiency: fewer columns faster.
Natural Keys
Definition
Business meaning: real-world identifier. Example: ISBN for books, SSN for people, VIN for vehicles. Stable: subject to change (rare).
Advantages
Meaningful: understandable (not arbitrary ID). Existing: often already unique. Business logic: aligned with domain.
Disadvantages
Stability: may change (SSN, email). Size: possibly large. Multi-column: composite keys more complex.
Risk
Change required: expensive (cascades via foreign keys). Solution: combine with surrogate (use both).
Surrogate Keys
Definition
Artificial identifier: created for uniqueness. Example: auto-increment ID, UUID. No business meaning: purely technical.
Advantages
Simplicity: single column. Stability: never changes. Efficiency: small (fast foreign keys). Flexibility: add real key constraints separately.
Disadvantages
Meaningless: arbitrary numbers. Overhead: extra column (small). Hiding relationships: real identity obscured.
Practical
Common: most modern systems use. Combined: surrogate as primary, natural as UNIQUE constraint. Balance: simplicity + business logic.
Auto-Increment
CREATE TABLE employees ( emp_id INT PRIMARY KEY AUTO_INCREMENT, ssn VARCHAR(20) UNIQUE, -- natural key (unique but not primary) name VARCHAR(100));Composite Keys
Definition
Multiple columns: together uniquely identify. Example: {student_id, course_id, semester} uniquely identifies enrollment.
Example
CREATE TABLE enrollments ( student_id INT, course_id INT, semester VARCHAR(20), grade CHAR(1), PRIMARY KEY (student_id, course_id, semester));Usage
Junction tables: many-to-many relationships. Domain entities: natural composite keys. Complex domain: multiple attributes needed.
Foreign Keys
Reference: all columns of composite. Example: FOREIGN KEY (student_id, course_id, semester). Complex: more joins needed.
Considerations
Complexity: more columns, more overhead. Queries: need all columns in WHERE. Normalization: check 2NF (partial dependencies?).
Uniqueness Enforcement
Primary Key Constraint
Unique: guaranteed (database enforces). Index: created automatically. Performance: fast lookup.
UNIQUE Constraint
CREATE TABLE users ( user_id INT PRIMARY KEY, email VARCHAR(100) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL);UNIQUE INDEX
Alternative: explicit index. Functionally same: ensures uniqueness. More control: index options.
NULL Handling
UNIQUE allows: multiple NULLs (NULL != NULL). Constraint: not enforced for NULLs (varies by DBMS). NOT NULL: required for strict uniqueness.
Violation Handling
Duplicate attempt: error (INSERT/UPDATE fails). Application: must handle (catch exception, retry).
Key Selection Strategies
Decision Process
1. Identify candidates: what uniquely identifies? 2. Choose primary: simplicity, stability, efficiency. 3. Enforce: constraints, indexes. 4. Document: schema comments.
Surrogate vs. Natural
Surrogate: preferred (simple, stable). Natural: add as UNIQUE (preserve business logic). Hybrid: best practice.
Composite vs. Simple
Simple: prefer (faster, simpler joins). Composite: when necessary (junction tables, natural compound identifiers). Evaluate: trade-offs.
Examples by Domain
User: surrogate ID (primary) + email UNIQUE (natural). Product: SKU (natural) + product_id (surrogate). Order: surrogate (order_id) primary.
Practical Examples
Simple Primary Key
CREATE TABLE departments ( dept_id INT PRIMARY KEY AUTO_INCREMENT, dept_name VARCHAR(50) UNIQUE NOT NULL);Natural Key
CREATE TABLE books ( isbn VARCHAR(20) PRIMARY KEY, title VARCHAR(100), author VARCHAR(100));Composite Key
CREATE TABLE enrollments ( student_id INT NOT NULL, course_id INT NOT NULL, semester VARCHAR(20) NOT NULL, grade CHAR(1), PRIMARY KEY (student_id, course_id, semester), FOREIGN KEY (student_id) REFERENCES students(id), FOREIGN KEY (course_id) REFERENCES courses(id));Hybrid Approach
CREATE TABLE users ( user_id INT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(100) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, name VARCHAR(100));References
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.