Introduction
Entity-Relationship (ER) model: high-level conceptual model for database design. Developed by Peter Chen (1976). Represent real-world entities, attributes, relationships. Intuitive: mirrors how humans conceptualize data. Foundation for relational database design.
Key insight: structure database around entities (things of interest) and relationships (how entities associate). Separate conceptual design from implementation. ER model independent of specific database system.
Advantages: intuitive, clear structure, guides normalization, enables communication between users and designers. Standard tool: before implementing database, design ER model, verify with stakeholders, convert to tables.
"Entity-Relationship model provides high-level abstraction for data requirements. Clear conceptualization prevents design flaws, ensures database reflects real-world semantics, facilitates future modifications." -- Database design methodology
Core Concepts: Entities and Attributes
Entity Definition
Entity: object, thing, or concept of interest. Examples: Employee, Student, Department, Course, Customer. Something distinguishable from other entities. Has properties (attributes).
Entity Types vs. Instances
Entity type: class/category (e.g., Employee). Entity instance: individual object (e.g., John Smith, employee ID 101). Entity set: collection of all instances of type.
Attributes Definition
Attribute: property or characteristic of entity. Employee attributes: Name, ID, Salary, Hire_Date. Describes entity. Every instance has values for attributes.
Domain of Attributes
Domain: set of possible values attribute can take. Employee age domain: integers 18-80. Constraints: type (string, integer, date), range (min-max), format.
Attribute Types
Simple: atomic value (Name, Age). Composite: subdivided (Address = Street + City + ZIP). Single-valued: one value per instance (Birth_Date). Multi-valued: multiple values (Phone_Numbers). Derived: computed from other attributes (Age computed from Birth_Date).
NULL Values
Attribute absence or unknown value represented as NULL. Example: Employee may not have Middle_Initial. Null handling in databases important.
Relationships and Associations
Relationship Definition
Relationship: association/connection between entities. Example: Employee works_for Department. Links entities semantically. Relationship type defines how entities relate.
Relationship Instances
Relationship instance: specific association (e.g., John works_for Sales Department). Collection: relationship set (all work_for associations).
Relationship Types
Binary: associates two entity types (Employee, Department). Ternary: three entity types. N-ary: multiple entities. Binary most common.
Relationship Attributes
Relationships can have attributes. works_for has Start_Date. Describes relationship, not entity. Example: manages relationship (Manager, Project) has Assignment_Date.
Relationship Degree
Degree: number of entity types participating. Unary (recursive): entity relates to itself (Employee supervises Employee). Binary: two types. Higher degrees possible but rare in practice.
Recursive Relationships
Entity type relates to itself. Employee supervises other Employees. Prerequisite course depends on other Prerequisite courses. Represent via roles (supervisor, subordinate).
Cardinality Constraints
Cardinality Definition
Cardinality: numerical relationship between entities. How many instances of one entity associate with instances of another. Expressed as ratios: 1:1, 1:N, M:N.
One-to-One (1:1)
Each entity instance associates with at most one instance of other type. Example: Employee-Parking_Spot (each employee one spot, each spot one employee). Rare in practice.
One-to-Many (1:N)
One entity instance associates with multiple instances of other type. Example: Department-Employee (one department, many employees). Most common cardinality.
Many-to-Many (M:N)
Each entity instance associates with multiple instances of other type, and vice versa. Example: Student-Course (student enrolled multiple courses, course has multiple students). Requires junction table in relational.
Cardinality Notation
Different notations: Chen's notation (1, N), Crow's foot (symbols like |-, |--). Express minimum and maximum cardinality. Minimum: 0 (optional) or 1 (mandatory). Maximum: 1 or N (many).
Examples
| Relationship | Cardinality | Example |
|---|---|---|
| 1:1 | One-to-One | Person-Passport |
| 1:N | One-to-Many | Department-Employee |
| N:N | Many-to-Many | Student-Course |
Participation Constraints
Total vs. Partial Participation
Total participation: every entity instance must participate in relationship. Example: every Employee must work_for some Department (mandatory). Denoted as double line in diagrams.
Partial participation: some instances may not participate. Example: Employee may manage zero departments (optional). Denoted as single line.
Combining Cardinality and Participation
Cardinality: how many. Participation: whether all. Example: Department-Employee (1:N, total-partial) means each department has many employees (cardinality), every employee works somewhere (total), but department may have zero employees (partial).
Existence Dependency
Entity existence depends on relationship. Example: if Dependent must have associated Employee (father/mother), Dependent has total participation. Without parent, Dependent cannot exist.
Relationship Constraints Summary
Cardinality constraints: structural (how many associates). Participation constraints: existence (whether participation required). Together specify database rules completely.
Keys and Identifiers
Primary Key
Attribute (or attribute set) uniquely identifying entity instance. Employee ID uniquely identifies employee. Each entity type must have primary key. Values unique, not null.
Candidate Keys
Attributes that could serve as primary key. Employee: ID, Email both unique. Choose one as primary, others candidate keys.
Composite Primary Key
Multiple attributes together form primary key. Enrollment (Student_ID, Course_ID) composite key (combination unique, neither alone sufficient).
Foreign Key
Attribute referencing primary key of another entity type. Employee.Dept_ID references Department.Dept_ID. Links tables, enables relationships in relational model.
Surrogate Keys
Artificial identifiers (usually auto-incrementing integers) when natural keys complex or non-existent. Example: Order_ID (system-generated), not based on real-world attribute.
Key Constraints
Superkey: attribute set uniquely identifying entity (may have extra attributes). Candidate key: minimal superkey (no attributes removable). Primary key: chosen candidate.
ER Diagrams and Notation
Diagram Components
Entities: rectangles labeled with type name. Attributes: ovals, connected to entities. Relationships: diamonds, connected to participating entities. Keys: underlined in rectangles.
Cardinality Notation
Chen's notation: 1, N on relationship lines. Crow's foot: |-- (one), |-- (many). IE notation: circles, lines. UML: class diagrams with multiplicity.
Participation Lines
Single line: partial (optional). Double line: total (mandatory). Drawn between entity and relationship. Indicates participation constraint.
Example ER Diagram
Department ----1:N---- Employee
| |
(partial) (total)
DeptID ----------- EmpID
Name Name
Location Salary
Hire_Date
Composite Attributes
Shown as nested ovals. Address breaks into Street, City, ZIP. Multivalued attributes: double ovals. Phone_Numbers: multiple phone numbers.
Derived Attributes
Shown as dashed oval. Age (computed from Birth_Date). Not stored, derived when needed.
Attribute Types and Properties
Simple vs. Composite
Simple: atomic, indivisible (Age, Salary). Composite: consists of components (Address = Street + City + ZIP). Composite useful for organization, query by components.
Single-Valued vs. Multi-Valued
Single: one value per instance (Birth_Date). Multi-valued: multiple values (Phone_Numbers: home, work, mobile). Multi-valued creates design decisions (normalize into separate table?).
Stored vs. Derived
Stored: physically stored in database (Birth_Date). Derived: computed from other attributes (Age = current_year - Birth_Year). Avoid storing derived to prevent inconsistency.
NULL Handling
Attribute absence: NULL value. Employee may lack Middle_Initial. Constraints: NOT NULL (mandatory), allows NULL (optional).
Default Values
Attribute automatically assigned value if not specified. Hire_Date defaults to today, Status defaults to 'active'. Simplifies data entry.
Weak Entities and Identifying Relationships
Weak Entity Definition
Entity without independent existence. Must participate in relationship. Dependent: cannot exist without Employee (parent). Weak entities exist only through identifying relationship.
Partial Key
Weak entity has partial key: attribute distinguishing instances within dependent set. Dependent: partial key is Name. Employee_ID + Name together uniquely identify dependent.
Identifying Relationship
Relationship that identifies weak entity. Has: Employee has Dependents. Critical: establishes existence dependency. Total participation required.
Diagram Representation
Weak entities: double rectangles. Partial key: underlined with dashed line. Identifying relationship: double diamond. Clearly show dependency.
Examples
Building-Room: Building owns Rooms. Room weak entity (no room without building). Project-Task: Project has Tasks. Task weak entity (no task without project context).
Weak vs. Strong Entities
Strong entity: independent existence, has primary key. Weak entity: dependent, has partial key. Most entities strong; weak entities special cases.
Specialization and Generalization
Specialization Concept
Top-down: define general entity (Employee), then specialized subclasses (Manager, Engineer, Technician). Subclass inherits attributes, adds own. Captures classification hierarchy.
Generalization Concept
Bottom-up: observe entities (Manager, Engineer), find commonalities, define superclass (Employee). Reverse of specialization.
IS-A Relationship
Specialization denoted as IS-A. Manager IS-A Employee. Subclass inherits all attributes of superclass, adds specialized attributes/relationships.
Constraints on Specialization
Disjoint: entity belongs to at most one subclass (Employee either Manager or Engineer, not both). Overlapping: entity may belong to multiple (Employee can be Manager and Lead_Developer).
Total: every superclass instance belongs to subclass (every Employee is Manager, Engineer, or Technician). Partial: some instances not in any subclass (Employee might be general staff).
Multiple Inheritance
Subclass has multiple superclasses. Manager inherits from Employee and Leadership. Combines attributes from both parents.
Diagram Notation
Specialization triangle connecting superclass to subclasses. Labeled disjoint/overlapping, total/partial. Hierarchical tree structure.
ER Model Design Process
Requirement Analysis
Understand domain, gather requirements. Who are entities? What attributes matter? How relate? Interview stakeholders, examine existing documents.
Entity Identification
Identify main entities. Examples: for university system: Student, Faculty, Course, Department. List candidates, verify necessity.
Attribute Selection
For each entity, list relevant attributes. Student: ID, Name, Email, Birth_Date. Exclude derived (compute if needed). Decide composite/simple.
Relationship Definition
Determine associations. Student enrolls Course. Faculty teaches Course. Department offers Course. Specify cardinality, participation.
Key Assignment
Define primary key for each entity. Student_ID for Student. Ensure uniqueness, not null.
Diagram Creation
Draw ER diagram representing model. Verify completeness: all requirements captured? Consistency: relationships sensible? Review with stakeholders.
Refinement
Iterate: feedback -> modifications -> new diagram. Database design spiral: repeat until satisfactory.
Mapping ER Model to Relational Schema
Table Creation
Each entity type: one table. Attributes become columns. Primary key designated. Example: Employee entity -> Employee table (EmpID, Name, Salary, DeptID).
Handling Relationships
1:N relationship: foreign key in N-side. Department-Employee: Employee.DeptID references Department. 1:1: foreign key in either side (often total participation side). M:N: junction table.
Junction Table for M:N
Student-Course (M:N): Enrollment table (StudentID, CourseID). Composite primary key. Preserves many-to-many association. Optional: relationship attributes become columns.
Weak Entity Tables
Weak entity table includes foreign key to owner. Dependent table: (DependentName, EmployeeID). Composite primary key (EmployeeID, DependentName).
Specialization Mapping
Option 1: superclass table + subclass tables with foreign keys. Option 2: separate table per subclass (including inherited attributes). Option 3: single table (single inheritance). Trade-offs in storage, queries.
Validation
Resulting relational schema preserves ER model semantics. Relationships represented via foreign keys. Constraints enforced. Schema normalized.
References
- Chen, P. "The Entity-Relationship Model: Toward a Unified View of Data." ACM Transactions on Database Systems, vol. 1, no. 1, 1976, pp. 9-36.
- Elmasri, R., and Navathe, S. B. "Fundamentals of Database Systems." Pearson, 7th edition, 2016.
- Date, C. J. "An Introduction to Database Systems." Pearson, 8th edition, 2004.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- Connolly, T. M., and Begg, C. E. "Database Systems: A Practical Approach to Design, Implementation, and Management." Pearson, 6th edition, 2014.