Introduction

File systems organize, store, retrieve, and manage data on storage devices. Core functions: data abstraction, space management, access control, and metadata handling. Essential for OS interaction with persistent storage.

"A file system is the backbone of data storage, enabling efficient data retrieval and integrity." -- Andrew S. Tanenbaum

File System Concepts

Files and File Abstraction

File: named collection of related data or information. Abstraction: hides physical storage details. Allows sequential, random, or indexed access.

Blocks and Clusters

Storage unit: block (fixed-size), cluster (group of blocks). Size impacts internal fragmentation and performance. Typical block size: 512 bytes to 4 KB.

Logical vs Physical View

Logical: file names, directories, metadata. Physical: sectors, tracks, cylinders on disk. File system maps logical structures to physical addresses.

Space Management

Tracks free and used storage space. Uses free lists, bitmaps, or linked structures. Goal: minimize fragmentation, maximize utilization.

File Allocation Methods

Contiguous Allocation

File occupies consecutive blocks. Advantages: simple, fast sequential access. Drawbacks: external fragmentation, difficult dynamic growth.

Linked Allocation

File blocks linked via pointers. Advantages: no external fragmentation, easy dynamic file growth. Drawbacks: poor random access performance.

Indexed Allocation

Index block contains pointers to file's data blocks. Supports direct access, reduces fragmentation. Overhead: index block consumes space.

Hybrid Schemes

Combines methods (e.g., UNIX inode structure uses indexed with indirect blocks). Balances access speed and efficient storage.

Allocation MethodAdvantagesDisadvantages
ContiguousFast access, simpleExternal fragmentation, inflexible
LinkedNo fragmentation, dynamic sizeSlow random access
IndexedDirect access, no fragmentationIndex overhead

Directory Structures

Single-Level Directory

All files in one directory. Simple but no name space separation. Causes name conflicts in multiple users/environments.

Two-Level Directory

Separate directories per user. Improves namespace isolation, reduces name conflicts. Limited hierarchy depth.

Tree-Structured Directory

Hierarchical directories. Supports subdirectories, flexible organization. Enables pathnames and recursive file search.

Acyclic Graph Directory

Allows shared directories/files (links). Prevents cycles, maintains consistency. Complex traversal and management.

General Graph Directory

Supports cycles via links. Requires cycle detection algorithms to prevent infinite loops during traversal.

Metadata and Attributes

File Metadata

Information describing file properties: size, type, timestamps, permissions, location pointers.

Timestamps

Include creation, modification, access times. Used for synchronization, backup, and auditing.

File Types

Regular files, directories, symbolic links, device files. Distinguished by metadata flags.

File Attributes

Read-only, hidden, system, archive flags. Influence file visibility and access behavior.

Inode Structure

UNIX file metadata container. Stores attributes and pointers to data blocks. Central to metadata management.

File System Types

FAT (File Allocation Table)

Used in early DOS/Windows. Simple, widely supported. Limitations: file size and partition size constraints.

NTFS (New Technology File System)

Microsoft's advanced FS. Features: journaling, permissions, encryption, compression, large file support.

EXT (Extended File System)

Linux standard. Multiple versions (EXT2, EXT3, EXT4). Features: journaling, extents, large volume support.

HFS+ and APFS

Apple file systems. APFS designed for SSDs, supports cloning, snapshots, encryption.

Network File Systems

NFS, SMB/CIFS: enable remote file access over network. Provide transparency and shared access.

Journaling and Logging

Purpose

Improve reliability by recording changes before applying. Enables recovery from crashes and power failures.

Write-Ahead Logging

Changes logged first in journal. Commit after success. Ensures atomicity of updates.

Journaling Types

Metadata journaling: logs only metadata. Full journaling: logs data and metadata. Tradeoff: performance vs safety.

Recovery Process

On reboot, journal replay restores consistent state. Prevents file system corruption.

File Permissions and Access Control

Permission Models

Discretionary Access Control (DAC), Mandatory Access Control (MAC), Role-Based Access Control (RBAC).

UNIX Permissions

Read, write, execute bits for owner, group, others. Represented as octal numbers.

Access Control Lists (ACLs)

Fine-grained permissions per user/group. Extend basic models for complex security policies.

Ownership and Grouping

Files owned by user and group. Ownership influences default permissions and access rights.

Permission Enforcement

Kernel enforces during system calls (open, read, write). Prevent unauthorized access.

# UNIX permission bits example-rwxr-xr--Owner: read/write/executeGroup: read/executeOthers: read only

Mounting and Name Spaces

Mounting

Process of making a file system accessible at a directory (mount point). Allows multiple FS coexistence.

Mount Points

Directories where FS is attached. Enables unified namespace across devices.

Name Spaces

Logical organization of file names. Supports hierarchical trees, links, aliases.

Virtual File Systems (VFS)

Abstraction layer. Provides uniform interface to different FS types. Simplifies OS file operations.

Automounting

Automatic mounting triggered on access. Enhances usability and resource management.

Fragmentation and Optimization

Fragmentation Types

External: free space split into small blocks. Internal: allocated blocks larger than needed.

Effects

Decreased read/write performance, increased seek time, inefficient storage use.

Defragmentation

Rearranges files to occupy contiguous space. Improves performance, reduces latency.

Allocation Strategies

First-fit, best-fit, worst-fit. Tradeoffs between speed and fragmentation control.

Preallocation and Extents

Allocate larger contiguous blocks in advance. Extents store start block and length, reducing overhead.

Data Integrity and Recovery

Error Detection

Checksums, cyclic redundancy checks (CRC) detect corruption in data blocks.

Redundancy

Mirroring, RAID configurations enhance fault tolerance.

Backup and Snapshots

Point-in-time copies facilitate recovery from accidental deletion or corruption.

Recovery Techniques

Journaling replay, fsck utilities check and repair FS inconsistencies.

Atomicity and Consistency

Ensures file operations complete fully or not at all. Prevents partial updates and corruption.

Performance and Scaling

Caching

File system cache stores frequently accessed data in RAM. Reduces disk I/O latency.

Buffering

Temporary storage of data during transfer. Optimizes throughput and CPU utilization.

Scalability

FS design accommodates growth in volume size, file count, and concurrent access.

Distributed File Systems

Support networked storage across multiple nodes. Provide fault tolerance, load balancing.

Concurrency Control

Locking, journaling, versioning prevent conflicts from simultaneous access.

# Example: simple write lock algorithmacquire_lock(file)write_data(file, data)release_lock(file)

References

  • Silberschatz, A., Galvin, P. B., & Gagne, G. "Operating System Concepts." Wiley, 10th Ed., 2018, pp. 523-579.
  • Tanenbaum, A. S., & Bos, H. "Modern Operating Systems." Pearson, 4th Ed., 2015, pp. 267-310.
  • McKusick, M. K., & Neville-Neil, G. V. "The Design and Implementation of the FreeBSD Operating System." Addison-Wesley, 2nd Ed., 2014, pp. 145-190.
  • Card, R., Ts'o, T., & Tweedie, S. "Design and Implementation of the Second Extended Filesystem." Proceedings of the First Dutch International Symposium on Linux, 1994, pp. 21-31.
  • Ghemawat, S., Gobioff, H., & Leung, S. T. "The Google File System." ACM Symposium on Operating Systems Principles (SOSP), 2003, pp. 29-43.