Definition and Purpose
What is Cache?
Cache: small, fast memory located close to CPU cores. Purpose: reduce average memory access time. Function: store frequently accessed instructions/data. Benefit: improves CPU throughput and reduces latency.
Role in Memory Hierarchy
Position: between CPU registers and main memory. Level: intermediary buffer. Effect: bridges speed gap between fast CPU and slower DRAM. Reduces costly main memory accesses.
Historical Context
Origin: introduced in 1960s mainframes. Evolution: from single-level caches to multi-level hierarchies. Importance: fundamental for modern processors’ performance scaling.
"A cache is a small, fast memory that stores copies of data from frequently used main memory locations." -- Andrew S. Tanenbaum
Cache Hierarchy
Levels of Cache
L1 Cache: smallest, fastest, split into instruction and data caches, size 16-64KB, cycle latency ~1-3. L2 Cache: larger, unified, size 256KB-1MB, latency ~5-12 cycles. L3 Cache: shared, multi-MB size, latency ~20-40 cycles.
Trade-offs
Size vs Speed: larger cache slower but holds more data. Levels: increase hit rate but add latency. Inclusion: inclusion property ensures L1 data in L2. Exclusivity: some architectures use exclusive caches to increase capacity.
Example Architectures
Intel Core i7: 3-level cache hierarchy. AMD Ryzen: inclusive L3 shared cache. ARM Cortex: typically 2-level cache with L2 shared. High-performance servers: multiple cache levels including L4.
| Cache Level | Size | Latency (cycles) | Typical Use |
|---|---|---|---|
| L1 | 16-64 KB | 1-3 | Per-core, split I/D |
| L2 | 256 KB - 1 MB | 5-12 | Per-core, unified |
| L3 | 2-32 MB | 20-40 | Shared, multi-core |
Cache Mapping Techniques
Direct-Mapped Cache
Mapping: each block maps to exactly one cache line. Indexing: uses lower bits of address. Advantage: simple, fast lookup. Disadvantage: conflict misses high due to fixed mapping.
Fully Associative Cache
Mapping: any block can be placed anywhere in cache. Lookup: parallel search of all tags. Advantage: minimizes conflict misses. Disadvantage: complex hardware, slower access time, higher power.
Set-Associative Cache
Mapping: cache divided into sets, each with multiple lines (ways). Block maps to one set but any line within. Trade-off: balances speed and flexibility. Common: 2-way, 4-way, 8-way associativity.
| Cache Type | Mapping | Advantages | Disadvantages |
|---|---|---|---|
| Direct-Mapped | One-to-one | Simple, fast | Conflict misses |
| Fully Associative | Any line | Low conflict misses | Complex, slow |
| Set-Associative | Set-based | Balance of speed &misses | More complex than direct |
Address Breakdown
Address components: Tag, Index, Block Offset. Index selects set/line. Tag verifies block identity. Offset selects byte within block.
Address = [Tag | Index | Block Offset]Block Size = 2^b bytesNumber of Sets = 2^sTag Size = Address bits - (s + b)Cache Replacement Policies
Least Recently Used (LRU)
Policy: evict block unused longest. Implementation: counters, stack. Advantage: exploits temporal locality. Disadvantage: complexity increases with associativity.
First In First Out (FIFO)
Policy: evict oldest block. Simple queue structure. Advantage: low overhead. Disadvantage: ignores usage patterns, less efficient than LRU.
Random Replacement
Policy: randomly select block to evict. Advantage: simplest, avoids pathological cases. Disadvantage: unpredictable performance.
Other Policies
Least Frequently Used (LFU), Pseudo-LRU approximations, Adaptive policies combining heuristics.
Example: For 4-way associative cache,On miss: Identify victim block by LRU counters Replace victim with new blockUpdate LRU state on every accessCache Coherence and Consistency
Coherence Problem
Issue: multiple caches may hold copies of same memory block. Problem: update in one cache must reflect in others. Without coherence: stale data, inconsistency.
Coherence Protocols
MESI: states Modified, Exclusive, Shared, Invalid. Protocol: manages block states and transitions on read/write. Maintains coherence via bus snooping or directory-based protocols.
Memory Consistency Models
Defines order of memory operations visible to processors. Examples: Sequential Consistency, Relaxed Consistency. Affects cache behavior and synchronization.
Implementation Approaches
Bus Snooping: monitors bus transactions. Directory-Based: centralized tracking of block states. Hybrid methods optimize scalability and performance.
Cache Performance Metrics
Hit Rate and Miss Rate
Hit Rate: fraction of accesses found in cache. Miss Rate = 1 - Hit Rate. Aim: maximize hit rate to reduce latency.
Types of Misses
Compulsory Misses: first-time access misses. Capacity Misses: cache too small. Conflict Misses: mapping collisions in cache.
Average Memory Access Time (AMAT)
Formula: AMAT = Hit Time + Miss Rate × Miss Penalty. Lower AMAT improves overall processor speed.
Bandwidth and Power
Cache reduces memory bandwidth demand. Power: smaller, faster caches consume less power. Trade-offs between size, speed, and power consumption.
AMAT = Hit Time + (Miss Rate × Miss Penalty)Example:Hit Time = 1 cycleMiss Rate = 5%Miss Penalty = 100 cyclesAMAT = 1 + 0.05 × 100 = 6 cyclesWrite Policies
Write-Through
Writes update cache and main memory simultaneously. Advantage: simple coherence. Disadvantage: higher memory traffic.
Write-Back
Writes update cache only; memory updated on block eviction. Advantage: reduces memory writes. Disadvantage: complexity in maintaining coherence.
Write Allocate vs No-Write Allocate
Write Allocate: loads block on write miss before update. No-Write Allocate: writes directly to memory on miss without loading. Used respectively with write-back and write-through.
Cache Design Considerations
Block Size
Typical sizes: 16-128 bytes. Larger blocks exploit spatial locality but increase miss penalty and memory traffic.
Associativity
Higher associativity reduces conflict misses but increases hit time and hardware complexity.
Latency vs Bandwidth
Trade-off: smaller caches faster but lower bandwidth; larger caches slower but higher capacity.
Tag Storage
Tags store metadata for validation. Tag size affects cache overhead and access time.
Inclusion Property
Inclusive caches guarantee lower-level data also present in higher levels. Simplifies coherence but reduces effective capacity.
Virtual vs Physical Caches
Virtual Cache
Indexed/tagged using virtual addresses. Advantage: early access before address translation. Challenge: synonym problem (multiple virtual addresses mapping to same physical address).
Physical Cache
Uses physical addresses for indexing/tagging. Avoids synonym issues. Slightly higher latency due to address translation first.
Virtual-Physical Hybrid
Virtually indexed, physically tagged caches combine benefits. Reduce latency with fewer synonym conflicts.
Multi-core and Shared Cache Architectures
Private vs Shared Caches
Private: each core has own L1/L2 caches. Shared: usually L3 cache shared among cores. Shared caches improve data sharing but increase contention.
Cache Coherence in Multi-core
Protocols ensure consistency across cores. Scalability challenges with increasing cores.
Non-Uniform Cache Access (NUCA)
Caches divided into banks with variable latency. Optimizes access time based on physical location.
Cache and Security Implications
Side-Channel Attacks
Cache timing leaks sensitive information. Examples: Spectre, Meltdown. Attackers infer data by measuring cache hits/misses.
Mitigation Techniques
Cache partitioning, randomization, flushing on context switch. Hardware and software defenses to reduce attack surface.
Trusted Execution Environments
Isolate sensitive computations from cache interference. Use secure cache designs to protect confidentiality.
Future Trends in Cache Technology
3D-stacked Caches
Vertical integration with logic dies. Benefits: reduced latency, higher bandwidth, lower power.
Non-Volatile Caches
Emerging memory technologies (e.g., MRAM) for persistent caches. Potential for instant-on and energy savings.
Machine Learning for Cache Management
Adaptive replacement policies using predictive algorithms. Dynamic tuning of cache parameters.
Security-Enhanced Cache Architectures
Hardware support for encryption, access control, and side-channel resistance.
References
- Hennessy, J. L., & Patterson, D. A. Computer Architecture: A Quantitative Approach. 6th ed., Morgan Kaufmann, 2017, pp. 123-178.
- Smith, A. J. Cache Memories. ACM Computing Surveys, vol. 14, no. 3, 1982, pp. 473-530.
- Qureshi, M. K., & Patt, Y. N. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006, pp. 423-432.
- Dubois, M., Scheurich, C., & Briggs, F. A. Synchronization and Coherence in Shared-Memory Multiprocessors. IEEE Computer, vol. 21, no. 6, 1988, pp. 9-21.
- Kocher, P. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. Advances in Cryptology, CRYPTO ’96, 1996, pp. 104-113.