Introduction
First Normal Form (1NF): foundation of database normalization. Requirement: attribute values atomic (indivisible). No multi-valued attributes, repeating groups, nested tables. Necessary condition for relational model. Most databases enforce 1NF automatically.
Historical context: Codd's original 1970 paper defined normalization. 1NF simplest but most crucial: ensures table structure follows relational principles. Foundation for further normalization (2NF, 3NF, BCNF).
Core idea: each attribute value single, indivisible value (not set, list, or nested structure). Simplifies querying, storage, updates. Avoids anomalies from non-atomic data.
"First Normal Form ensures data atomic, eliminating structural complexity from tables. Foundation for relational model guarantees, enables efficient operations, prevents data redundancy." -- Database normalization theory
1NF Definition and Principles
Formal Definition
Relation in 1NF if every attribute value is atomic (indivisible). Domain: set of atomic values (integers, strings, dates, not sets). No nested tables, repeating groups, or structured attributes.
Atomicity Concept
Atomic: cannot decomposed further. Single value, not collection. Example: "John" atomic (string value). "John,Jane" not atomic (could separate). "123 Main St, NY" composite (street + city, but often treated atomic in practice).
Relational Model Requirement
Relational model defines relations as sets of tuples. Each tuple: mapping attributes to atomic values. Non-atomic violates relational model: enables nested/hierarchical structures (not relational).
Practical Implication
1NF easy to achieve/enforce in practice. Modern databases don't allow non-atomic attributes (some violate via JSON columns, arrays). Most tables naturally in 1NF.
Normalization Goal
1NF: starting point. Higher normal forms (2NF, 3NF) build on 1NF, addressing dependencies. Path: unnormalized -> 1NF -> 2NF -> 3NF -> BCNF.
Atomic Values Requirement
What is Atomic?
Atomic value: single, indivisible piece of data. Examples: 42 (integer), "Alice" (string), '2024-03-30' (date). Not: {1,2,3} (set), [A,B,C] (list), nested record.
Indivisibility Test
Question: can value meaningfully subdivided? Yes = not atomic. Example: "123 Main St, NY" could split (street, city), argue for atomicity in context. But (123 Main St) and (NY) separate concepts, ideally separate columns.
Domain Definition
Attribute domain: set of allowed values. All atomic. Example: Employee.Age domain: integers 0-120. Employee.Email domain: valid email strings. Domain constrains values.
NULL as Special Value
NULL: represents absence/unknown. Atomic (single value). Not list, not nested. But avoid overuse: too many nulls indicate design issues.
Practical Atomicity
In practice: string "John Smith" considered atomic (name is logical unit). But could split (First_Name, Last_Name) for better design. Context-dependent: atomicity relative to application needs.
1NF Violations
Multi-Valued Attributes
Single attribute holding multiple values. Example: Student.PhoneNumbers = "123-4567, 234-5678, 345-6789". Violates 1NF: multiple values in one cell.
Repeating Groups
Multiple columns for same concept. Example:
Student table (violates 1NF):
StudentID | Name | Phone1 | Phone2 | Phone3
101 | Alice | 123-4567 | 234-5678 | NULL
102 | Bob | 345-6789 | NULL | NULL
Problem: varying number of phones, awkward NULL handling, searching difficult.
Nested Tables
Non-relational: attribute value is table. Example: Student (ID, Name, Courses: [Course1, Course2, ...]). Nested structure violates relational model.
Structured Values
Composite but stored as single value. Example: Address = "123 Main St, New York, NY 10001". Could decompose: Street, City, State, ZIP. Questionable violation (context-dependent).
Array/List Columns
Some databases (PostgreSQL, MySQL) support arrays: column value is array. Example: hobbies ARRAY VARCHAR. Technically violates 1NF (non-atomic). Modern databases allow but complicates querying.
Example Violation
Employee (violates 1NF):
EmpID | Name | Skills
101 | Alice | Java, Python, SQL
102 | Bob | C++, Java
103 | Charlie | Python
Problem: searching for "Java" awkward, counting skills difficult.
Examples and Unnesting
Unnormalized Table
Course (violates 1NF):
CourseID | Title | Students
CS101 | Intro Python | [Alice, Bob, Charlie]
CS202 | Data Structures | [Alice, David]
CS303 | Algorithms | [Bob, David, Eve]
Multi-valued Students attribute: not atomic.
Conversion to 1NF
Create junction table separating students from courses:
Course (1NF):
CourseID | Title
CS101 | Intro Python
CS202 | Data Structures
CS303 | Algorithms
Enrollment (1NF):
CourseID | StudentName
CS101 | Alice
CS101 | Bob
CS101 | Charlie
CS202 | Alice
CS202 | David
CS303 | Bob
CS303 | David
CS303 | Eve
Benefits
Students separated from Courses. Each cell atomic (single value). Querying easy: "Find all students in CS101" simple SELECT. Maintaining: add/remove student easy.
Repeating Group Elimination
Before: varying columns (Phone1, Phone2, Phone3). After: separate Phone table (StudentID, PhoneNumber). Flexible: any number of phones per student.
Design Principle
One-to-many relationships: separate tables, not repeating groups. Junction tables for many-to-many. Atomic values always.
Handling Multi-Valued Attributes
Recognition
Multi-valued: attribute can have multiple values for single instance. Example: Person might have multiple email addresses. Students take multiple courses.
Solution: Separate Table
Create junction/association table. Original table plus new table with foreign key. Example: Person-Email: Person table (PersonID, Name), Email table (PersonID fk, Email).
Junction Table Design
Person table:
PersonID (pk) | Name
1 | Alice
2 | Bob
Email table (multivalued):
PersonID (fk) | Email
1 | alice@work.com
1 | alice@home.com
2 | bob@company.com
Primary key: (PersonID, Email) or separate EmailID.
Querying Multi-Valued
Find all emails for person: JOIN Person and Email on PersonID. Find people with specific email: WHERE Email.Email = '...'.
Advantages
Flexible: any number of values. Queryable: standard relational operations. Maintainable: add/remove values easy. Atomic: each row has single value.
Comparison
Non-atomic (multi-valued in single column): inflexible, hard to query. Atomic (separate table): standard relational, easy operations.
Converting to 1NF
Steps
1. Identify non-atomic attributes or repeating groups. 2. Create new table for multi-valued data. 3. Add foreign key to original table. 4. Move data: one value per row.
Example: Student Courses
Non-normalized:
StudentID | Name | Courses
1 | Alice | CS101, CS202, CS303
2 | Bob | CS101, CS202
Step 1: Identify multi-valued Courses
Step 2: Create Enrollment table
Step 3: Add StudentID foreign key
Step 4: Move data
Result:
Student: StudentID, Name
Enrollment: StudentID (fk), CourseID (composite pk)
Data Migration
Convert existing data: parse multi-values, create rows. Example: "CS101, CS202" -> two rows. Careful: preserve all data, validate conversion.
Incremental Conversion
Large tables: convert in batches. Validate each: ensure atomic values. Gradually migrate applications to new schema.
Automation
Scripts can parse comma-separated values, generate INSERT statements. Test on copy: verify correctness before production migration.
Composite vs. Atomic Attributes
Composite Attributes
Composed of sub-attributes. Address: Street, City, State, ZIP. Name: First, Middle, Last. Atomicity ambiguous: Address "atomic" contextually, but decomposable.
Design Decision
Store composite as single column or decompose? Depends: if never queried separately, keep atomic. If frequently accessed sub-parts, decompose.
Example: Address
Option 1: Address column (single string "123 Main, NY, NY 10001"). Atomic, simple. Searching by ZIP difficult.
Option 2: Street, City, State, ZIP columns. Decomposed, queryable. Normalizing, allows ZIP-based searches.
Practical Atomicity
What's atomic varies by domain. Financial applications: decompose currency details. Casual applications: keep composite. Balance atomicity (1NF strict) vs. practicality.
1NF Perspective
Strict 1NF: composite attributes decomposed. Practical 1NF: complex-but-indivisible structures allowed. Modern databases flexible, enforce at application layer.
Eliminating Repeating Groups
Repeating Group Definition
Multiple columns for same concept. Example: Phone1, Phone2, Phone3 columns. Violates 1NF: multiple occurrences of attribute.
Problem Recognition
Pattern: columns numbered (Column1, Column2, Column3...). Suggests repeating group. Or NULL padding when values < maximum. Indication of design issue.
Elimination Method
Create separate table: one row per occurrence. Example: Student-Phone becomes Student table + Phone table (StudentID fk, Phone). Flexible: any number phones.
Implementation
Before (repeating groups):
StudentID | Name | Phone1 | Phone2 | Phone3
1 | Alice | 123-4567 | 234-5678 | 345-6789
2 | Bob | 456-7890 | NULL | NULL
After (1NF):
Student: StudentID, Name
Phone: StudentID (fk), Phone
StudentID | Phone
1 | 123-4567
1 | 234-5678
1 | 345-6789
2 | 456-7890
Advantages
Flexible: unlimited values. Atomic: single value per cell. Maintainable: add/remove phone easy. Queryable: standard operations.
Advantages of 1NF
Structural Clarity
Clean table structure: rows, columns, atomic values. Easy to understand, document. Matches relational model semantics.
Querying Simplicity
Standard SQL queries work reliably. SELECT, WHERE, JOIN straightforward. Non-atomic data complicates queries (parsing, string manipulation).
Update Efficiency
Atomicity enables efficient updates. Change phone number: single row update. Repeating groups: may affect multiple columns.
Data Integrity
Atomic values prevent inconsistencies. Multi-valued attributes risk duplication, mismatch. 1NF ensures single, consistent representation.
Maintenance Ease
Adding/removing values from multi-valued attributes: create/delete row (simple). Repeating groups: restructure table (complex).
Performance
Atomic values: efficient storage, indexing. Multi-valued: inefficient, hard to optimize. Database engines optimize 1NF tables.
Limitations of 1NF
Incomplete Normalization
1NF necessary but insufficient. Doesn't eliminate functional dependencies, transitive dependencies. Tables in 1NF may have redundancy (2NF, 3NF address further).
Update Anomalies Possible
1NF doesn't prevent anomalies entirely. Example: partial dependencies cause insertion anomalies. Further normalization (2NF) required.
Design Complexity
Separating multi-valued attributes: more tables, more joins. Slight complexity increase for more normalized schema.
Query Complexity
Atomicity/separation may require more JOINs. Simple-appearing queries need multiple tables. Trades query simplicity for data integrity.
Historical Data
Legacy systems may have non-1NF tables (pre-relational design). Migration costly. Some databases tolerate non-1NF (violate strict model).
Path to Higher Normal Forms
Normalization Hierarchy
1NF: atomic values. 2NF: no partial dependencies (remove non-key attribute depending on part of composite key). 3NF: no transitive dependencies (remove non-key attribute depending on another non-key). BCNF: stricter version of 3NF.
Dependencies
1NF necessary foundation. 2NF requires 1NF. 3NF requires 2NF. BCNF stronger than 3NF. Chain: each depends on previous.
Practical Path
Most applications: 3NF sufficient. BCNF useful for complex scenarios. Beyond 3NF: diminishing returns, increased complexity. Business needs, not theory, determine target.
Trade-Offs
Higher forms: reduce anomalies, ensure consistency. Cost: more tables, more joins. Decision: anomaly prevention vs. query complexity.
Denormalization
Sometimes: deliberately denormalize (add redundancy) for performance. Example: cache computed values. Sacrifice consistency for speed. Justified only when necessary.
References
- Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, vol. 13, no. 6, 1970, pp. 377-387.
- Elmasri, R., and Navathe, S. B. "Fundamentals of Database Systems." Pearson, 7th edition, 2016.
- Date, C. J. "Database in Depth: Relational Theory for Practitioners." O'Reilly Media, 2005.
- Silberschatz, A., Korth, H. F., and Sudarshan, S. "Database System Concepts." McGraw-Hill, 6th edition, 2010.
- Kent, W. "A Simple Guide to Five Normal Forms in Relational Database Theory." Communications of the ACM, vol. 26, no. 2, 1983, pp. 120-125.