Introduction
Relation: mathematical concept (table in database). Set of tuples (rows). Attributes (columns). Relational algebra: formal operations (SELECT, PROJECT, JOIN, etc.). Foundation: SQL queries (translated to algebra).
"Relations are sets of tuples: formal, elegant, powerful. Relational algebra provides operations: expressive, composable. SQL: practical realization of relational theory." -- Relational model
Relation Definition
Formal Definition
Relation R on domains D1, D2, ..., Dn: subset of Cartesian product D1 × D2 × ... × Dn. Set: no duplicates, unordered. Tuple: element of relation (n-valued).
Simple Definition
Table: rows (tuples), columns (attributes). Data structure: set-based (mathematical). Operations: defined formally (algebra).
Example
Employees relation:
Domain: EmployeeID (integers), Name (strings), Salary (decimals)
Tuples: (1, 'Alice', 50000), (2, 'Bob', 60000), ...
Relation: set of all employee tuples
Attributes and Domains
Attribute
Named: column name. Typed: associated domain (integers, strings, dates). Role: describes meaning (e.g., EmployeeID, Name).
Domain
Set of possible values: integers [1, 1000000], strings [any text], dates [valid dates]. Type: constrains values. Interpretation: combined with attribute name.
Example
Attribute: Salary
Domain: DECIMAL(10, 2) - decimal numbers up to 99999999.99
Meaning: employee's annual salary in currency
NULL Value
Missing/unknown: represented as NULL. Not domain value: special (represents absence). Handling: three-valued logic (true, false, unknown).
Tuples
Definition
Ordered list: values corresponding to attributes. Example: (1, 'Alice', 50000) matches (EmployeeID, Name, Salary). Uniqueness: primarily (key enforces).
Properties
Atomic: single value per attribute (no composite). Complete: value for every attribute (or NULL). Unique: identified by key. Unordered: logically (display order irrelevant).
Insertion/Deletion
Add: new tuple (with key). Delete: by key (entire row). Update: modify tuple values. Constraints: enforced (referential, domain).
Relation Schema
Definition
Structure: attribute names and domains. Notation: R(A1:D1, A2:D2, ..., An:Dn). Example: Employee(ID:INT, Name:VARCHAR, Salary:DECIMAL).
Components
Relation name: identified. Attributes: with domains. Constraints: primary key, foreign keys, checks. Degrees: number of attributes (n-ary).
Metadata
Stored: in database catalog (system tables). Queries: information_schema. Tools: schema visualization, documentation.
Relation Instance
Definition
Actual data: specific tuples conforming to schema. Snapshot: point-in-time state. Changes: inserts, updates, deletes modify.
Temporal Nature
Dynamic: instance varies over time. Each operation: changes instance. Schema: stable (usually). Persistence: instance survives sessions (durability).
Constraints Validation
Instance checked: against schema constraints. Invalid instance: prevented (or detected, corrected). Integrity: maintained throughout operations.
Relational Algebra
Concept
Formal operations: on relations. Inputs: one or more relations. Output: relation (closed). Composition: operations can be combined. Foundation: SQL (queries translated to algebra).
Operation Categories
Unary: one input (SELECT, PROJECT, RENAME). Binary: two inputs (JOIN, UNION, DIFFERENCE). Generalized: multiple inputs.
SQL Correspondence
SELECT clause -> PROJECT (choose columns). WHERE clause -> SELECT (filter rows). JOIN -> JOIN operation. UNION, INTERSECT, EXCEPT -> set operations.
Basic Operations
Selection (σ)
Filter rows: WHERE condition. Input: relation. Output: subset of tuples. Example: σ(Salary > 50000)(Employee).
Projection (π)
Choose columns: SELECT specific attributes. Input: relation. Output: tuples with selected attributes. Example: π(Name, Salary)(Employee).
Join (⋈)
Combine tables: related attributes. Input: two relations. Output: combined tuples. Example: Employee ⋈ Department (on Dept_ID).
Union (∪)
Combine: all tuples from both relations. Schemas: must match. Duplicates: removed (set operation). Example: PartTime ∪ FullTime.
Set Difference (−)
Remove: tuples in second from first. Example: AllEmployees − Terminated = CurrentEmployees.
Cartesian Product (×)
All combinations: every tuple from R1 with every from R2. Size: |R1| * |R2|. Useful: with selection (simulates join).
Properties
Set Semantics
No duplicates: relations are sets. Each tuple: unique (or key enforces). Unordered: sequence irrelevant (logically).
Closure
Operations output: relations. Can compose: chain operations. Expressiveness: multiple equivalent queries.
Completeness
Relational algebra: Turing-complete? No (limitations). SQL: more powerful (aggregate functions, recursion). Practical: covers most needs.
Optimization
Algebra: basis for optimization. Rewrite rules: equivalent expressions, different costs. Optimizer: chooses efficient plan. Implementation: SQL to algebra translation.
References
- Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13, no. 6, 1970, pp. 377-387.
- Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.