Overview

Memory hierarchy: arrangement of storage types by speed, cost, and capacity. Objective: maximize access speed while minimizing cost. Faster memories closer to CPU, slower memories farther. Trade-offs: cost per bit, access latency, storage volume. Multi-level approach: registers, caches, main memory, secondary storage, tertiary storage. Essential for modern computer performance.

"The memory hierarchy is fundamental to bridging the speed gap between fast processors and slower memory." -- John L. Hennessy & David A. Patterson

Levels of Memory Hierarchy

Registers

Smallest, fastest storage. Located inside CPU. Size: 32-128 bits. Access time: 1 CPU cycle. Used for immediate data manipulation.

Cache Memory

Intermediate storage between registers and main memory. Size: KB to few MB. Access time: 1-10 cycles. Types: L1, L2, L3. Stores frequently accessed data to reduce latency.

Main Memory (RAM)

Primary volatile memory. Size: GBs. Access time: 50-100 ns. Holds active processes and data.

Secondary Storage

Non-volatile, large capacity. Examples: HDDs, SSDs. Access time: milliseconds. Stores OS, applications, files.

Tertiary and Off-line Storage

Used for backups and archives. Examples: magnetic tapes, optical disks. High latency, very large capacity.

Cache Memory

Cache Organization

Levels: L1 (split instruction/data), L2 (unified), L3 (shared). Size increases with level; speed decreases. Write policies: write-through, write-back. Replacement policies: LRU, FIFO, random.

Cache Mapping Techniques

Direct mapped: one block per cache line, simple but high conflict misses. Fully associative: any block in any line, complex hardware. Set associative: compromise, blocks mapped to set of lines.

Cache Performance Metrics

Hit rate: fraction of accesses found in cache. Miss rate: 1 - hit rate. Miss penalty: extra time to fetch from lower level memory. Average memory access time (AMAT): key performance measure.

Cache LevelTypical SizeAccess Time
L132-64 KB1-4 cycles
L2256 KB - 1 MB10-20 cycles
L32-16 MB20-50 cycles

Main Memory (RAM)

Technology

Typically DRAM: dynamic, requires periodic refresh. SRAM for cache, faster but costly. DRAM density higher, cheaper per bit.

Organization

Structured in rows and columns. Access requires row activation, column selection. Banks used for parallelism. Memory channels improve bandwidth.

Memory Controllers

Interface between CPU and RAM. Schedule accesses, manage refresh cycles. Optimize latency and throughput.

Secondary Storage

Hard Disk Drives (HDD)

Magnetic storage. Large capacity (TBs), moderate cost. Latency: 3-12 ms. Sequential throughput: 100-200 MB/s.

Solid State Drives (SSD)

Flash memory based. Faster (latency < 0.1 ms), more expensive per GB. High random access speed.

Storage Hierarchy Role

Stores persistent data. Swap space for virtual memory. Backup and archival uses.

Access Time and Latency

Definitions

Access time: total time to read/write data. Latency: delay before transfer starts. Throughput: amount of data transferred per time unit.

Latency Components

Address decode, row/column access, transfer delay. Cache latency measured in cycles; disk latency in ms.

Impact on Performance

Latency directly affects CPU stall cycles. Lower latency improves instruction throughput and user experience.

AMAT = Hit Time + Miss Rate × Miss Penalty

Memory Bandwidth

Definition

Data transfer rate between memory and processor, measured in bytes/second.

Factors Affecting Bandwidth

Bus width, memory clock speed, number of channels, prefetching.

Improvement Techniques

Multi-channel memory, DDR SDRAM, burst transfers, memory interleaving.

Memory TypeTypical Bandwidth
DDR3-160012.8 GB/s
DDR4-320025.6 GB/s
PCIe Gen4 SSD5 GB/s

Principles of Memory Hierarchy

Speed vs. Size vs. Cost

Faster memories more costly and smaller. Hierarchical design balances these factors for optimal system cost-performance.

Data Placement

Frequently used data kept closer to CPU. Less used data stored in slower, larger memories.

Replacement Policies

Determine which data to evict from faster memories. Common policies: Least Recently Used (LRU), First-In-First-Out (FIFO).

Locality of Reference

Temporal Locality

Recently accessed data likely to be accessed again soon. Justifies caching.

Spatial Locality

Data near recently accessed addresses likely to be used shortly. Justifies block fetching and prefetching.

Exploitation in Design

Cache block sizes, prefetch algorithms, and memory organization leverage locality to improve performance.

Memory Management Techniques

Virtual Memory

Extends addressable memory using disk space. Uses paging or segmentation. Enables multiprogramming and memory protection.

Paging

Divides memory into fixed-size pages. Page tables translate virtual to physical addresses.

Segmentation

Divides memory into variable-size segments based on logical divisions. Supports sharing and protection.

Virtual Address = Segment Number + OffsetPhysical Address = Frame Number + Offset (Paging)

Performance Optimization

Prefetching

Predicts future data needs, loads data before requested. Reduces perceived latency.

Write Policies

Write-through: immediate update of lower memory, simpler but slower. Write-back: update on eviction, faster but complex coherence.

Parallelism

Multiple memory channels and banks improve throughput. Multithreading hides latency.

References

  • Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 6th Edition, pp. 100-150.
  • Jacob, B., Ng, S., & Wang, D. T. (2010). Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, pp. 50-120.
  • Smith, A. J. (1982). Cache Memories. ACM Computing Surveys, 14(3), 473-530.
  • Stallings, W. (2018). Computer Organization and Architecture: Designing for Performance. Pearson, 10th Edition, pp. 320-370.
  • Denning, P. J. (1968). The Working Set Model for Program Behavior. Communications of the ACM, 11(5), 323-333.