Overview
Memory hierarchy: arrangement of storage types by speed, cost, and capacity. Objective: maximize access speed while minimizing cost. Faster memories closer to CPU, slower memories farther. Trade-offs: cost per bit, access latency, storage volume. Multi-level approach: registers, caches, main memory, secondary storage, tertiary storage. Essential for modern computer performance.
"The memory hierarchy is fundamental to bridging the speed gap between fast processors and slower memory." -- John L. Hennessy & David A. Patterson
Levels of Memory Hierarchy
Registers
Smallest, fastest storage. Located inside CPU. Size: 32-128 bits. Access time: 1 CPU cycle. Used for immediate data manipulation.
Cache Memory
Intermediate storage between registers and main memory. Size: KB to few MB. Access time: 1-10 cycles. Types: L1, L2, L3. Stores frequently accessed data to reduce latency.
Main Memory (RAM)
Primary volatile memory. Size: GBs. Access time: 50-100 ns. Holds active processes and data.
Secondary Storage
Non-volatile, large capacity. Examples: HDDs, SSDs. Access time: milliseconds. Stores OS, applications, files.
Tertiary and Off-line Storage
Used for backups and archives. Examples: magnetic tapes, optical disks. High latency, very large capacity.
Cache Memory
Cache Organization
Levels: L1 (split instruction/data), L2 (unified), L3 (shared). Size increases with level; speed decreases. Write policies: write-through, write-back. Replacement policies: LRU, FIFO, random.
Cache Mapping Techniques
Direct mapped: one block per cache line, simple but high conflict misses. Fully associative: any block in any line, complex hardware. Set associative: compromise, blocks mapped to set of lines.
Cache Performance Metrics
Hit rate: fraction of accesses found in cache. Miss rate: 1 - hit rate. Miss penalty: extra time to fetch from lower level memory. Average memory access time (AMAT): key performance measure.
| Cache Level | Typical Size | Access Time |
|---|---|---|
| L1 | 32-64 KB | 1-4 cycles |
| L2 | 256 KB - 1 MB | 10-20 cycles |
| L3 | 2-16 MB | 20-50 cycles |
Main Memory (RAM)
Technology
Typically DRAM: dynamic, requires periodic refresh. SRAM for cache, faster but costly. DRAM density higher, cheaper per bit.
Organization
Structured in rows and columns. Access requires row activation, column selection. Banks used for parallelism. Memory channels improve bandwidth.
Memory Controllers
Interface between CPU and RAM. Schedule accesses, manage refresh cycles. Optimize latency and throughput.
Secondary Storage
Hard Disk Drives (HDD)
Magnetic storage. Large capacity (TBs), moderate cost. Latency: 3-12 ms. Sequential throughput: 100-200 MB/s.
Solid State Drives (SSD)
Flash memory based. Faster (latency < 0.1 ms), more expensive per GB. High random access speed.
Storage Hierarchy Role
Stores persistent data. Swap space for virtual memory. Backup and archival uses.
Access Time and Latency
Definitions
Access time: total time to read/write data. Latency: delay before transfer starts. Throughput: amount of data transferred per time unit.
Latency Components
Address decode, row/column access, transfer delay. Cache latency measured in cycles; disk latency in ms.
Impact on Performance
Latency directly affects CPU stall cycles. Lower latency improves instruction throughput and user experience.
AMAT = Hit Time + Miss Rate × Miss PenaltyMemory Bandwidth
Definition
Data transfer rate between memory and processor, measured in bytes/second.
Factors Affecting Bandwidth
Bus width, memory clock speed, number of channels, prefetching.
Improvement Techniques
Multi-channel memory, DDR SDRAM, burst transfers, memory interleaving.
| Memory Type | Typical Bandwidth |
|---|---|
| DDR3-1600 | 12.8 GB/s |
| DDR4-3200 | 25.6 GB/s |
| PCIe Gen4 SSD | 5 GB/s |
Principles of Memory Hierarchy
Speed vs. Size vs. Cost
Faster memories more costly and smaller. Hierarchical design balances these factors for optimal system cost-performance.
Data Placement
Frequently used data kept closer to CPU. Less used data stored in slower, larger memories.
Replacement Policies
Determine which data to evict from faster memories. Common policies: Least Recently Used (LRU), First-In-First-Out (FIFO).
Locality of Reference
Temporal Locality
Recently accessed data likely to be accessed again soon. Justifies caching.
Spatial Locality
Data near recently accessed addresses likely to be used shortly. Justifies block fetching and prefetching.
Exploitation in Design
Cache block sizes, prefetch algorithms, and memory organization leverage locality to improve performance.
Memory Management Techniques
Virtual Memory
Extends addressable memory using disk space. Uses paging or segmentation. Enables multiprogramming and memory protection.
Paging
Divides memory into fixed-size pages. Page tables translate virtual to physical addresses.
Segmentation
Divides memory into variable-size segments based on logical divisions. Supports sharing and protection.
Virtual Address = Segment Number + OffsetPhysical Address = Frame Number + Offset (Paging)Performance Optimization
Prefetching
Predicts future data needs, loads data before requested. Reduces perceived latency.
Write Policies
Write-through: immediate update of lower memory, simpler but slower. Write-back: update on eviction, faster but complex coherence.
Parallelism
Multiple memory channels and banks improve throughput. Multithreading hides latency.
Future Trends in Memory Hierarchy
Non-Volatile Memory Technologies
Emerging memories like MRAM, PCM combine speed of RAM with persistence. Potential to flatten hierarchy.
3D-Stacked Memory
Vertical integration of memory dies reduces latency, increases bandwidth. Examples: HBM, HMC.
Software-Defined Memory Management
Dynamic adaptation of memory hierarchy through OS and hardware cooperation. Improves resource utilization and performance.
References
- Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 6th Edition, pp. 100-150.
- Jacob, B., Ng, S., & Wang, D. T. (2010). Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, pp. 50-120.
- Smith, A. J. (1982). Cache Memories. ACM Computing Surveys, 14(3), 473-530.
- Stallings, W. (2018). Computer Organization and Architecture: Designing for Performance. Pearson, 10th Edition, pp. 320-370.
- Denning, P. J. (1968). The Working Set Model for Program Behavior. Communications of the ACM, 11(5), 323-333.