Introduction
File systems organize, store, retrieve, and manage data on storage devices. Core functions: data abstraction, space management, access control, and metadata handling. Essential for OS interaction with persistent storage.
"A file system is the backbone of data storage, enabling efficient data retrieval and integrity." -- Andrew S. Tanenbaum
File System Concepts
Files and File Abstraction
File: named collection of related data or information. Abstraction: hides physical storage details. Allows sequential, random, or indexed access.
Blocks and Clusters
Storage unit: block (fixed-size), cluster (group of blocks). Size impacts internal fragmentation and performance. Typical block size: 512 bytes to 4 KB.
Logical vs Physical View
Logical: file names, directories, metadata. Physical: sectors, tracks, cylinders on disk. File system maps logical structures to physical addresses.
Space Management
Tracks free and used storage space. Uses free lists, bitmaps, or linked structures. Goal: minimize fragmentation, maximize utilization.
File Allocation Methods
Contiguous Allocation
File occupies consecutive blocks. Advantages: simple, fast sequential access. Drawbacks: external fragmentation, difficult dynamic growth.
Linked Allocation
File blocks linked via pointers. Advantages: no external fragmentation, easy dynamic file growth. Drawbacks: poor random access performance.
Indexed Allocation
Index block contains pointers to file's data blocks. Supports direct access, reduces fragmentation. Overhead: index block consumes space.
Hybrid Schemes
Combines methods (e.g., UNIX inode structure uses indexed with indirect blocks). Balances access speed and efficient storage.
| Allocation Method | Advantages | Disadvantages |
|---|---|---|
| Contiguous | Fast access, simple | External fragmentation, inflexible |
| Linked | No fragmentation, dynamic size | Slow random access |
| Indexed | Direct access, no fragmentation | Index overhead |
Directory Structures
Single-Level Directory
All files in one directory. Simple but no name space separation. Causes name conflicts in multiple users/environments.
Two-Level Directory
Separate directories per user. Improves namespace isolation, reduces name conflicts. Limited hierarchy depth.
Tree-Structured Directory
Hierarchical directories. Supports subdirectories, flexible organization. Enables pathnames and recursive file search.
Acyclic Graph Directory
Allows shared directories/files (links). Prevents cycles, maintains consistency. Complex traversal and management.
General Graph Directory
Supports cycles via links. Requires cycle detection algorithms to prevent infinite loops during traversal.
Metadata and Attributes
File Metadata
Information describing file properties: size, type, timestamps, permissions, location pointers.
Timestamps
Include creation, modification, access times. Used for synchronization, backup, and auditing.
File Types
Regular files, directories, symbolic links, device files. Distinguished by metadata flags.
File Attributes
Read-only, hidden, system, archive flags. Influence file visibility and access behavior.
Inode Structure
UNIX file metadata container. Stores attributes and pointers to data blocks. Central to metadata management.
File System Types
FAT (File Allocation Table)
Used in early DOS/Windows. Simple, widely supported. Limitations: file size and partition size constraints.
NTFS (New Technology File System)
Microsoft's advanced FS. Features: journaling, permissions, encryption, compression, large file support.
EXT (Extended File System)
Linux standard. Multiple versions (EXT2, EXT3, EXT4). Features: journaling, extents, large volume support.
HFS+ and APFS
Apple file systems. APFS designed for SSDs, supports cloning, snapshots, encryption.
Network File Systems
NFS, SMB/CIFS: enable remote file access over network. Provide transparency and shared access.
Journaling and Logging
Purpose
Improve reliability by recording changes before applying. Enables recovery from crashes and power failures.
Write-Ahead Logging
Changes logged first in journal. Commit after success. Ensures atomicity of updates.
Journaling Types
Metadata journaling: logs only metadata. Full journaling: logs data and metadata. Tradeoff: performance vs safety.
Recovery Process
On reboot, journal replay restores consistent state. Prevents file system corruption.
File Permissions and Access Control
Permission Models
Discretionary Access Control (DAC), Mandatory Access Control (MAC), Role-Based Access Control (RBAC).
UNIX Permissions
Read, write, execute bits for owner, group, others. Represented as octal numbers.
Access Control Lists (ACLs)
Fine-grained permissions per user/group. Extend basic models for complex security policies.
Ownership and Grouping
Files owned by user and group. Ownership influences default permissions and access rights.
Permission Enforcement
Kernel enforces during system calls (open, read, write). Prevent unauthorized access.
# UNIX permission bits example-rwxr-xr--Owner: read/write/executeGroup: read/executeOthers: read onlyMounting and Name Spaces
Mounting
Process of making a file system accessible at a directory (mount point). Allows multiple FS coexistence.
Mount Points
Directories where FS is attached. Enables unified namespace across devices.
Name Spaces
Logical organization of file names. Supports hierarchical trees, links, aliases.
Virtual File Systems (VFS)
Abstraction layer. Provides uniform interface to different FS types. Simplifies OS file operations.
Automounting
Automatic mounting triggered on access. Enhances usability and resource management.
Fragmentation and Optimization
Fragmentation Types
External: free space split into small blocks. Internal: allocated blocks larger than needed.
Effects
Decreased read/write performance, increased seek time, inefficient storage use.
Defragmentation
Rearranges files to occupy contiguous space. Improves performance, reduces latency.
Allocation Strategies
First-fit, best-fit, worst-fit. Tradeoffs between speed and fragmentation control.
Preallocation and Extents
Allocate larger contiguous blocks in advance. Extents store start block and length, reducing overhead.
Data Integrity and Recovery
Error Detection
Checksums, cyclic redundancy checks (CRC) detect corruption in data blocks.
Redundancy
Mirroring, RAID configurations enhance fault tolerance.
Backup and Snapshots
Point-in-time copies facilitate recovery from accidental deletion or corruption.
Recovery Techniques
Journaling replay, fsck utilities check and repair FS inconsistencies.
Atomicity and Consistency
Ensures file operations complete fully or not at all. Prevents partial updates and corruption.
Performance and Scaling
Caching
File system cache stores frequently accessed data in RAM. Reduces disk I/O latency.
Buffering
Temporary storage of data during transfer. Optimizes throughput and CPU utilization.
Scalability
FS design accommodates growth in volume size, file count, and concurrent access.
Distributed File Systems
Support networked storage across multiple nodes. Provide fault tolerance, load balancing.
Concurrency Control
Locking, journaling, versioning prevent conflicts from simultaneous access.
# Example: simple write lock algorithmacquire_lock(file)write_data(file, data)release_lock(file)References
- Silberschatz, A., Galvin, P. B., & Gagne, G. "Operating System Concepts." Wiley, 10th Ed., 2018, pp. 523-579.
- Tanenbaum, A. S., & Bos, H. "Modern Operating Systems." Pearson, 4th Ed., 2015, pp. 267-310.
- McKusick, M. K., & Neville-Neil, G. V. "The Design and Implementation of the FreeBSD Operating System." Addison-Wesley, 2nd Ed., 2014, pp. 145-190.
- Card, R., Ts'o, T., & Tweedie, S. "Design and Implementation of the Second Extended Filesystem." Proceedings of the First Dutch International Symposium on Linux, 1994, pp. 21-31.
- Ghemawat, S., Gobioff, H., & Leung, S. T. "The Google File System." ACM Symposium on Operating Systems Principles (SOSP), 2003, pp. 29-43.