Keys - Database Systems | What's Your IQ

Introduction

Key: attribute(s) uniquely identifying row. Essential: relational model. Primary key: designated unique identifier. Candidate key: alternative unique identifier. Foundation: referential integrity, relationships.

"Keys are the glue of relational databases: uniquely identify rows, enable joins, enforce consistency. Careful choice: foundation of good design." -- Relational model

Primary Key

Definition

Chosen candidate key: designated unique identifier. One per table. Not NULL: always. Indexed: automatically. Used: foreign key references.

Properties

Uniqueness: guaranteed. NOT NULL: required. Immutability: should not change (stable). Minimal: no unnecessary columns.

Example

CREATE TABLE employees (
 employee_id INT PRIMARY KEY,
 name VARCHAR(100),
 salary DECIMAL(10, 2)
);

Usage

Identify rows: in queries, updates, deletes. Foreign keys: reference by primary key. Relationships: primary key joins tables.

Clustered Index

Physical ordering: by primary key (typically). Performance: ranges on PK fast. Design: consider clustering carefully.

Candidate Keys

Definition

Minimal superkey: uniquely identifies, no redundancy. Multiple per table possible. One becomes primary. Others: alternate unique identifiers.

Example

Employee table candidates:
 {employee_id} - numeric identifier (simple)
 {ssn} - social security number (natural)
 {first_name, last_name, hire_date} - composite (complex)

Choose: employee_id as primary (simple, stable)

Non-Primary Candidates

Can define: UNIQUE constraint. Or: UNIQUE index. Enforced: database ensures uniqueness. Used: for lookups.

Selection Criteria

Simplicity: fewer columns. Stability: immutable. Efficiency: indexed, used frequently. Business logic: natural candidates preferred (when applicable).

Superkeys

Definition

Set of attributes uniquely identifying row. May be redundant: contains more columns than necessary. Example: {employee_id, ssn} (both alone are unique, combined redundant).

Relationship to Keys

Superkey: general concept. Candidate key: minimal superkey (no redundancy). Primary key: chosen candidate.

Redundancy

Superkey may be: unnecessary columns. Design: minimize (choose candidate key). Efficiency: fewer columns faster.

Natural Keys

Definition

Business meaning: real-world identifier. Example: ISBN for books, SSN for people, VIN for vehicles. Stable: subject to change (rare).

Advantages

Meaningful: understandable (not arbitrary ID). Existing: often already unique. Business logic: aligned with domain.

Disadvantages

Stability: may change (SSN, email). Size: possibly large. Multi-column: composite keys more complex.

Risk

Change required: expensive (cascades via foreign keys). Solution: combine with surrogate (use both).

Surrogate Keys

Definition

Artificial identifier: created for uniqueness. Example: auto-increment ID, UUID. No business meaning: purely technical.

Advantages

Simplicity: single column. Stability: never changes. Efficiency: small (fast foreign keys). Flexibility: add real key constraints separately.

Disadvantages

Meaningless: arbitrary numbers. Overhead: extra column (small). Hiding relationships: real identity obscured.

Practical

Common: most modern systems use. Combined: surrogate as primary, natural as UNIQUE constraint. Balance: simplicity + business logic.

Auto-Increment

CREATE TABLE employees (
 emp_id INT PRIMARY KEY AUTO_INCREMENT,
 ssn VARCHAR(20) UNIQUE, -- natural key (unique but not primary)
 name VARCHAR(100)
);

Composite Keys

Definition

Multiple columns: together uniquely identify. Example: {student_id, course_id, semester} uniquely identifies enrollment.

Example

CREATE TABLE enrollments (
 student_id INT,
 course_id INT,
 semester VARCHAR(20),
 grade CHAR(1),
 PRIMARY KEY (student_id, course_id, semester)
);

Usage

Junction tables: many-to-many relationships. Domain entities: natural composite keys. Complex domain: multiple attributes needed.

Foreign Keys

Reference: all columns of composite. Example: FOREIGN KEY (student_id, course_id, semester). Complex: more joins needed.

Considerations

Complexity: more columns, more overhead. Queries: need all columns in WHERE. Normalization: check 2NF (partial dependencies?).

Uniqueness Enforcement

Primary Key Constraint

Unique: guaranteed (database enforces). Index: created automatically. Performance: fast lookup.

UNIQUE Constraint

CREATE TABLE users (
 user_id INT PRIMARY KEY,
 email VARCHAR(100) UNIQUE NOT NULL,
 username VARCHAR(50) UNIQUE NOT NULL
);

UNIQUE INDEX

Alternative: explicit index. Functionally same: ensures uniqueness. More control: index options.

NULL Handling

UNIQUE allows: multiple NULLs (NULL != NULL). Constraint: not enforced for NULLs (varies by DBMS). NOT NULL: required for strict uniqueness.

Violation Handling

Duplicate attempt: error (INSERT/UPDATE fails). Application: must handle (catch exception, retry).

Key Selection Strategies

Decision Process

1. Identify candidates: what uniquely identifies? 2. Choose primary: simplicity, stability, efficiency. 3. Enforce: constraints, indexes. 4. Document: schema comments.

Surrogate vs. Natural

Surrogate: preferred (simple, stable). Natural: add as UNIQUE (preserve business logic). Hybrid: best practice.

Composite vs. Simple

Simple: prefer (faster, simpler joins). Composite: when necessary (junction tables, natural compound identifiers). Evaluate: trade-offs.

Examples by Domain

User: surrogate ID (primary) + email UNIQUE (natural). Product: SKU (natural) + product_id (surrogate). Order: surrogate (order_id) primary.

Practical Examples

Simple Primary Key

CREATE TABLE departments (
 dept_id INT PRIMARY KEY AUTO_INCREMENT,
 dept_name VARCHAR(50) UNIQUE NOT NULL
);

Natural Key

CREATE TABLE books (
 isbn VARCHAR(20) PRIMARY KEY,
 title VARCHAR(100),
 author VARCHAR(100)
);

Composite Key

CREATE TABLE enrollments (
 student_id INT NOT NULL,
 course_id INT NOT NULL,
 semester VARCHAR(20) NOT NULL,
 grade CHAR(1),
 PRIMARY KEY (student_id, course_id, semester),
 FOREIGN KEY (student_id) REFERENCES students(id),
 FOREIGN KEY (course_id) REFERENCES courses(id)
);

Hybrid Approach

CREATE TABLE users (
 user_id INT PRIMARY KEY AUTO_INCREMENT,
 email VARCHAR(100) UNIQUE NOT NULL,
 username VARCHAR(50) UNIQUE NOT NULL,
 name VARCHAR(100)
);

References

Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.
Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.