Introduction

Entity-Relationship (ER) model: high-level conceptual model for database design. Developed by Peter Chen (1976). Represent real-world entities, attributes, relationships. Intuitive: mirrors how humans conceptualize data. Foundation for relational database design.

Key insight: structure database around entities (things of interest) and relationships (how entities associate). Separate conceptual design from implementation. ER model independent of specific database system.

Advantages: intuitive, clear structure, guides normalization, enables communication between users and designers. Standard tool: before implementing database, design ER model, verify with stakeholders, convert to tables.

"Entity-Relationship model provides high-level abstraction for data requirements. Clear conceptualization prevents design flaws, ensures database reflects real-world semantics, facilitates future modifications." -- Database design methodology

Core Concepts: Entities and Attributes

Entity Definition

Entity: object, thing, or concept of interest. Examples: Employee, Student, Department, Course, Customer. Something distinguishable from other entities. Has properties (attributes).

Entity Types vs. Instances

Entity type: class/category (e.g., Employee). Entity instance: individual object (e.g., John Smith, employee ID 101). Entity set: collection of all instances of type.

Attributes Definition

Attribute: property or characteristic of entity. Employee attributes: Name, ID, Salary, Hire_Date. Describes entity. Every instance has values for attributes.

Domain of Attributes

Domain: set of possible values attribute can take. Employee age domain: integers 18-80. Constraints: type (string, integer, date), range (min-max), format.

Attribute Types

Simple: atomic value (Name, Age). Composite: subdivided (Address = Street + City + ZIP). Single-valued: one value per instance (Birth_Date). Multi-valued: multiple values (Phone_Numbers). Derived: computed from other attributes (Age computed from Birth_Date).

NULL Values

Attribute absence or unknown value represented as NULL. Example: Employee may not have Middle_Initial. Null handling in databases important.

Relationships and Associations

Relationship Definition

Relationship: association/connection between entities. Example: Employee works_for Department. Links entities semantically. Relationship type defines how entities relate.

Relationship Instances

Relationship instance: specific association (e.g., John works_for Sales Department). Collection: relationship set (all work_for associations).

Relationship Types

Binary: associates two entity types (Employee, Department). Ternary: three entity types. N-ary: multiple entities. Binary most common.

Relationship Attributes

Relationships can have attributes. works_for has Start_Date. Describes relationship, not entity. Example: manages relationship (Manager, Project) has Assignment_Date.

Relationship Degree

Degree: number of entity types participating. Unary (recursive): entity relates to itself (Employee supervises Employee). Binary: two types. Higher degrees possible but rare in practice.

Recursive Relationships

Entity type relates to itself. Employee supervises other Employees. Prerequisite course depends on other Prerequisite courses. Represent via roles (supervisor, subordinate).

Cardinality Constraints

Cardinality Definition

Cardinality: numerical relationship between entities. How many instances of one entity associate with instances of another. Expressed as ratios: 1:1, 1:N, M:N.

One-to-One (1:1)

Each entity instance associates with at most one instance of other type. Example: Employee-Parking_Spot (each employee one spot, each spot one employee). Rare in practice.

One-to-Many (1:N)

One entity instance associates with multiple instances of other type. Example: Department-Employee (one department, many employees). Most common cardinality.

Many-to-Many (M:N)

Each entity instance associates with multiple instances of other type, and vice versa. Example: Student-Course (student enrolled multiple courses, course has multiple students). Requires junction table in relational.

Cardinality Notation

Different notations: Chen's notation (1, N), Crow's foot (symbols like |-, |--). Express minimum and maximum cardinality. Minimum: 0 (optional) or 1 (mandatory). Maximum: 1 or N (many).

Examples

Relationship Cardinality Example
1:1 One-to-One Person-Passport
1:N One-to-Many Department-Employee
N:N Many-to-Many Student-Course

Participation Constraints

Total vs. Partial Participation

Total participation: every entity instance must participate in relationship. Example: every Employee must work_for some Department (mandatory). Denoted as double line in diagrams.

Partial participation: some instances may not participate. Example: Employee may manage zero departments (optional). Denoted as single line.

Combining Cardinality and Participation

Cardinality: how many. Participation: whether all. Example: Department-Employee (1:N, total-partial) means each department has many employees (cardinality), every employee works somewhere (total), but department may have zero employees (partial).

Existence Dependency

Entity existence depends on relationship. Example: if Dependent must have associated Employee (father/mother), Dependent has total participation. Without parent, Dependent cannot exist.

Relationship Constraints Summary

Cardinality constraints: structural (how many associates). Participation constraints: existence (whether participation required). Together specify database rules completely.

Keys and Identifiers

Primary Key

Attribute (or attribute set) uniquely identifying entity instance. Employee ID uniquely identifies employee. Each entity type must have primary key. Values unique, not null.

Candidate Keys

Attributes that could serve as primary key. Employee: ID, Email both unique. Choose one as primary, others candidate keys.

Composite Primary Key

Multiple attributes together form primary key. Enrollment (Student_ID, Course_ID) composite key (combination unique, neither alone sufficient).

Foreign Key

Attribute referencing primary key of another entity type. Employee.Dept_ID references Department.Dept_ID. Links tables, enables relationships in relational model.

Surrogate Keys

Artificial identifiers (usually auto-incrementing integers) when natural keys complex or non-existent. Example: Order_ID (system-generated), not based on real-world attribute.

Key Constraints

Superkey: attribute set uniquely identifying entity (may have extra attributes). Candidate key: minimal superkey (no attributes removable). Primary key: chosen candidate.

ER Diagrams and Notation

Diagram Components

Entities: rectangles labeled with type name. Attributes: ovals, connected to entities. Relationships: diamonds, connected to participating entities. Keys: underlined in rectangles.

Cardinality Notation

Chen's notation: 1, N on relationship lines. Crow's foot: |-- (one), |-- (many). IE notation: circles, lines. UML: class diagrams with multiplicity.

Participation Lines

Single line: partial (optional). Double line: total (mandatory). Drawn between entity and relationship. Indicates participation constraint.

Example ER Diagram

Department ----1:N---- Employee
 | |
 (partial) (total)
 DeptID ----------- EmpID
 Name Name
 Location Salary
 Hire_Date

Composite Attributes

Shown as nested ovals. Address breaks into Street, City, ZIP. Multivalued attributes: double ovals. Phone_Numbers: multiple phone numbers.

Derived Attributes

Shown as dashed oval. Age (computed from Birth_Date). Not stored, derived when needed.

Attribute Types and Properties

Simple vs. Composite

Simple: atomic, indivisible (Age, Salary). Composite: consists of components (Address = Street + City + ZIP). Composite useful for organization, query by components.

Single-Valued vs. Multi-Valued

Single: one value per instance (Birth_Date). Multi-valued: multiple values (Phone_Numbers: home, work, mobile). Multi-valued creates design decisions (normalize into separate table?).

Stored vs. Derived

Stored: physically stored in database (Birth_Date). Derived: computed from other attributes (Age = current_year - Birth_Year). Avoid storing derived to prevent inconsistency.

NULL Handling

Attribute absence: NULL value. Employee may lack Middle_Initial. Constraints: NOT NULL (mandatory), allows NULL (optional).

Default Values

Attribute automatically assigned value if not specified. Hire_Date defaults to today, Status defaults to 'active'. Simplifies data entry.

Weak Entities and Identifying Relationships

Weak Entity Definition

Entity without independent existence. Must participate in relationship. Dependent: cannot exist without Employee (parent). Weak entities exist only through identifying relationship.

Partial Key

Weak entity has partial key: attribute distinguishing instances within dependent set. Dependent: partial key is Name. Employee_ID + Name together uniquely identify dependent.

Identifying Relationship

Relationship that identifies weak entity. Has: Employee has Dependents. Critical: establishes existence dependency. Total participation required.

Diagram Representation

Weak entities: double rectangles. Partial key: underlined with dashed line. Identifying relationship: double diamond. Clearly show dependency.

Examples

Building-Room: Building owns Rooms. Room weak entity (no room without building). Project-Task: Project has Tasks. Task weak entity (no task without project context).

Weak vs. Strong Entities

Strong entity: independent existence, has primary key. Weak entity: dependent, has partial key. Most entities strong; weak entities special cases.

Specialization and Generalization

Specialization Concept

Top-down: define general entity (Employee), then specialized subclasses (Manager, Engineer, Technician). Subclass inherits attributes, adds own. Captures classification hierarchy.

Generalization Concept

Bottom-up: observe entities (Manager, Engineer), find commonalities, define superclass (Employee). Reverse of specialization.

IS-A Relationship

Specialization denoted as IS-A. Manager IS-A Employee. Subclass inherits all attributes of superclass, adds specialized attributes/relationships.

Constraints on Specialization

Disjoint: entity belongs to at most one subclass (Employee either Manager or Engineer, not both). Overlapping: entity may belong to multiple (Employee can be Manager and Lead_Developer).

Total: every superclass instance belongs to subclass (every Employee is Manager, Engineer, or Technician). Partial: some instances not in any subclass (Employee might be general staff).

Multiple Inheritance

Subclass has multiple superclasses. Manager inherits from Employee and Leadership. Combines attributes from both parents.

Diagram Notation

Specialization triangle connecting superclass to subclasses. Labeled disjoint/overlapping, total/partial. Hierarchical tree structure.

ER Model Design Process

Requirement Analysis

Understand domain, gather requirements. Who are entities? What attributes matter? How relate? Interview stakeholders, examine existing documents.

Entity Identification

Identify main entities. Examples: for university system: Student, Faculty, Course, Department. List candidates, verify necessity.

Attribute Selection

For each entity, list relevant attributes. Student: ID, Name, Email, Birth_Date. Exclude derived (compute if needed). Decide composite/simple.

Relationship Definition

Determine associations. Student enrolls Course. Faculty teaches Course. Department offers Course. Specify cardinality, participation.

Key Assignment

Define primary key for each entity. Student_ID for Student. Ensure uniqueness, not null.

Diagram Creation

Draw ER diagram representing model. Verify completeness: all requirements captured? Consistency: relationships sensible? Review with stakeholders.

Refinement

Iterate: feedback -> modifications -> new diagram. Database design spiral: repeat until satisfactory.

Mapping ER Model to Relational Schema

Table Creation

Each entity type: one table. Attributes become columns. Primary key designated. Example: Employee entity -> Employee table (EmpID, Name, Salary, DeptID).

Handling Relationships

1:N relationship: foreign key in N-side. Department-Employee: Employee.DeptID references Department. 1:1: foreign key in either side (often total participation side). M:N: junction table.

Junction Table for M:N

Student-Course (M:N): Enrollment table (StudentID, CourseID). Composite primary key. Preserves many-to-many association. Optional: relationship attributes become columns.

Weak Entity Tables

Weak entity table includes foreign key to owner. Dependent table: (DependentName, EmployeeID). Composite primary key (EmployeeID, DependentName).

Specialization Mapping

Option 1: superclass table + subclass tables with foreign keys. Option 2: separate table per subclass (including inherited attributes). Option 3: single table (single inheritance). Trade-offs in storage, queries.

Validation

Resulting relational schema preserves ER model semantics. Relationships represented via foreign keys. Constraints enforced. Schema normalized.

References

  • Chen, P. "The Entity-Relationship Model: Toward a Unified View of Data." ACM Transactions on Database Systems, vol. 1, no. 1, 1976, pp. 9-36.
  • Elmasri, R., and Navathe, S. B. "Fundamentals of Database Systems." Pearson, 7th edition, 2016.
  • Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2004.
  • Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
  • Connolly, T. M., and Begg, C. E. "Database Systems: A Practical Approach to Design, Implementation, and Management." Pearson, 6th edition, 2014.