Introduction

Relation: mathematical concept (table in database). Set of tuples (rows). Attributes (columns). Relational algebra: formal operations (SELECT, PROJECT, JOIN, etc.). Foundation: SQL queries (translated to algebra).

"Relations are sets of tuples: formal, elegant, powerful. Relational algebra provides operations: expressive, composable. SQL: practical realization of relational theory." -- Relational model

Relation Definition

Formal Definition

Relation R on domains D1, D2, ..., Dn: subset of Cartesian product D1 × D2 × ... × Dn. Set: no duplicates, unordered. Tuple: element of relation (n-valued).

Simple Definition

Table: rows (tuples), columns (attributes). Data structure: set-based (mathematical). Operations: defined formally (algebra).

Example

Employees relation:
Domain: EmployeeID (integers), Name (strings), Salary (decimals)
Tuples: (1, 'Alice', 50000), (2, 'Bob', 60000), ...
Relation: set of all employee tuples

Attributes and Domains

Attribute

Named: column name. Typed: associated domain (integers, strings, dates). Role: describes meaning (e.g., EmployeeID, Name).

Domain

Set of possible values: integers [1, 1000000], strings [any text], dates [valid dates]. Type: constrains values. Interpretation: combined with attribute name.

Example

Attribute: Salary
Domain: DECIMAL(10, 2) - decimal numbers up to 99999999.99
Meaning: employee's annual salary in currency

NULL Value

Missing/unknown: represented as NULL. Not domain value: special (represents absence). Handling: three-valued logic (true, false, unknown).

Tuples

Definition

Ordered list: values corresponding to attributes. Example: (1, 'Alice', 50000) matches (EmployeeID, Name, Salary). Uniqueness: primarily (key enforces).

Properties

Atomic: single value per attribute (no composite). Complete: value for every attribute (or NULL). Unique: identified by key. Unordered: logically (display order irrelevant).

Insertion/Deletion

Add: new tuple (with key). Delete: by key (entire row). Update: modify tuple values. Constraints: enforced (referential, domain).

Relation Schema

Definition

Structure: attribute names and domains. Notation: R(A1:D1, A2:D2, ..., An:Dn). Example: Employee(ID:INT, Name:VARCHAR, Salary:DECIMAL).

Components

Relation name: identified. Attributes: with domains. Constraints: primary key, foreign keys, checks. Degrees: number of attributes (n-ary).

Metadata

Stored: in database catalog (system tables). Queries: information_schema. Tools: schema visualization, documentation.

Relation Instance

Definition

Actual data: specific tuples conforming to schema. Snapshot: point-in-time state. Changes: inserts, updates, deletes modify.

Temporal Nature

Dynamic: instance varies over time. Each operation: changes instance. Schema: stable (usually). Persistence: instance survives sessions (durability).

Constraints Validation

Instance checked: against schema constraints. Invalid instance: prevented (or detected, corrected). Integrity: maintained throughout operations.

Relational Algebra

Concept

Formal operations: on relations. Inputs: one or more relations. Output: relation (closed). Composition: operations can be combined. Foundation: SQL (queries translated to algebra).

Operation Categories

Unary: one input (SELECT, PROJECT, RENAME). Binary: two inputs (JOIN, UNION, DIFFERENCE). Generalized: multiple inputs.

SQL Correspondence

SELECT clause -> PROJECT (choose columns). WHERE clause -> SELECT (filter rows). JOIN -> JOIN operation. UNION, INTERSECT, EXCEPT -> set operations.

Basic Operations

Selection (σ)

Filter rows: WHERE condition. Input: relation. Output: subset of tuples. Example: σ(Salary > 50000)(Employee).

Projection (π)

Choose columns: SELECT specific attributes. Input: relation. Output: tuples with selected attributes. Example: π(Name, Salary)(Employee).

Join (⋈)

Combine tables: related attributes. Input: two relations. Output: combined tuples. Example: Employee ⋈ Department (on Dept_ID).

Union (∪)

Combine: all tuples from both relations. Schemas: must match. Duplicates: removed (set operation). Example: PartTime ∪ FullTime.

Set Difference (−)

Remove: tuples in second from first. Example: AllEmployees − Terminated = CurrentEmployees.

Cartesian Product (×)

All combinations: every tuple from R1 with every from R2. Size: |R1| * |R2|. Useful: with selection (simulates join).

Properties

Set Semantics

No duplicates: relations are sets. Each tuple: unique (or key enforces). Unordered: sequence irrelevant (logically).

Closure

Operations output: relations. Can compose: chain operations. Expressiveness: multiple equivalent queries.

Completeness

Relational algebra: Turing-complete? No (limitations). SQL: more powerful (aggregate functions, recursion). Practical: covers most needs.

Optimization

Algebra: basis for optimization. Rewrite rules: equivalent expressions, different costs. Optimizer: chooses efficient plan. Implementation: SQL to algebra translation.

References

  • Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13, no. 6, 1970, pp. 377-387.
  • Ramakrishnan, R., and Gehrke, J. "Database Management Systems." McGraw-Hill, 3rd edition, 2003.