Definition and Overview
Central Processing Unit
CPU: electronic circuit executing instructions from computer programs. Functions: data processing, control signal generation, arithmetic and logical operations.
Role in Computer System
Coordinates input/output, memory, and storage devices. Acts as "brain" of computer. Executes instructions sequentially or in parallel.
Basic Operation
Fetch-decode-execute cycle: retrieves instruction, interprets opcode, performs operation, stores results.
Types of CPUs
General-purpose CPUs, embedded processors, microcontrollers, digital signal processors (DSPs).
"The CPU is the heart of a computer, orchestrating operations at unimaginable speeds." -- John L. Hennessy
Historical Development
Early Computers
First CPUs: discrete components, vacuum tubes. Example: ENIAC (1940s).
Integrated Circuits
1960s: IC technology enabled complex CPUs on single chips. Reduced cost and size.
Microprocessors
1970s: Intel 4004 introduced first commercial microprocessor. Integrated CPU onto single silicon chip.
Modern Evolution
Multi-core processors, superscalar designs, out-of-order execution, speculative execution innovations.
CPU Architecture
Von Neumann Architecture
Single memory stores data and instructions. Sequential execution. Bottleneck: memory access speed.
Harvard Architecture
Separate memory for instructions and data. Enables parallel access and improved throughput.
RISC vs CISC
RISC: reduced instruction set, simplified decoding, fixed instruction length. CISC: complex instructions, variable length.
Architectural Layers
Instruction set architecture (ISA), microarchitecture, logic design, physical implementation.
Instruction Cycle
Fetch
CPU reads instruction from memory at program counter (PC) address.
Decode
Control unit interprets opcode, identifies operands and operation type.
Execute
ALU or other units perform operation: arithmetic, logic, memory access, control transfer.
Store
Results written back to registers or memory.
Cycle Repeat
PC updated to next instruction address unless branch or interrupt occurs.
Core Components
Arithmetic Logic Unit (ALU)
Performs arithmetic and logic operations.
Control Unit (CU)
Generates control signals, manages instruction sequencing.
Registers
Small, fast storage holding data and addresses.
Buses
Data, address, and control buses facilitate communication within CPU and with memory.
Registers and Their Functions
Program Counter (PC)
Holds address of next instruction.
Instruction Register (IR)
Stores current instruction being executed.
Accumulator (ACC)
Holds intermediate arithmetic results.
General-Purpose Registers
Temporary data storage for operands and results.
Status Register (Flags)
Indicates CPU state: zero, carry, overflow, sign flags.
| Register | Function |
|---|---|
| PC | Next instruction address |
| IR | Current instruction |
| ACC | Arithmetic results |
| General Registers | Temporary data storage |
| Status Register | Flags indicating CPU state |
Arithmetic Logic Unit (ALU)
Functionality
Performs integer arithmetic: addition, subtraction, multiplication, division. Logical operations: AND, OR, NOT, XOR.
Data Path
Receives operands from registers or immediate values. Outputs results to registers or memory.
Flags Update
Sets CPU flags based on results: zero, carry, overflow, sign.
Design Variants
Combinational logic, pipelined ALUs, multi-bit parallel ALUs.
Operation Example:ADD R1, R21. Fetch operands from R1, R22. ALU adds values3. Store result in destination register4. Update flags accordinglyControl Unit
Role
Decodes instructions, generates control signals to coordinate CPU components.
Types
Hardwired: fixed logic circuitry. Microprogrammed: uses control memory with microinstructions.
Instruction Decoding
Interprets opcode, determines operand addresses and operation sequence.
Timing and Control
Manages clock cycles, synchronizes data flow, manages interrupts.
Pipelining and Parallelism
Pipeline Stages
Typical stages: fetch, decode, execute, memory access, write-back.
Instruction-Level Parallelism (ILP)
Multiple instructions processed simultaneously at different pipeline stages.
Hazards
Structural: resource conflicts. Data: operand dependencies. Control: branch instructions.
Hazard Mitigation
Forwarding, stall cycles, branch prediction techniques.
| Hazard Type | Description | Mitigation |
|---|---|---|
| Structural | Resource conflicts | Hardware duplication, stalls |
| Data | Operand dependency | Forwarding, stalls |
| Control | Branch instructions | Branch prediction |
Cache Memory and Optimization
Purpose
Stores frequently accessed data to reduce latency and memory bottlenecks.
Levels
L1: smallest, fastest; L2: larger, slower; L3: shared cache in multicore CPUs.
Cache Mapping
Direct-mapped, associative, set-associative caches.
Cache Coherence
Protocols ensure consistency across multiple caches in multicore systems.
Cache Hit Rate = (Number of cache hits) / (Total memory accesses)Optimization goals: maximize hit rate, minimize latency.Performance Metrics
Clock Speed
Frequency of CPU cycles, measured in GHz.
Instructions Per Cycle (IPC)
Average instructions executed per clock cycle.
Throughput and Latency
Throughput: instructions per second. Latency: time per instruction.
Power Efficiency
Performance per watt critical in mobile and embedded applications.
Benchmarking
Standardized tests (SPEC, Geekbench) measure CPU capabilities.
Instruction Set Architectures
Definition
Set of instructions CPU can execute. Defines programmer-visible behavior.
RISC Characteristics
Fixed-length instructions, load/store architecture, simple addressing modes.
CISC Characteristics
Variable-length instructions, complex addressing, multiple addressing modes.
Examples
RISC: ARM, MIPS, RISC-V. CISC: x86, x86-64.
Instruction Formats
Opcode, operand specifiers, addressing modes, immediate values.
Microarchitecture Variants
Single-Cycle
Each instruction completes in one clock cycle. Simple design, slow clock.
Multi-Cycle
Instructions take multiple cycles, each cycle performs part of instruction.
Superscalar
Multiple instructions issued and executed per cycle using parallel pipelines.
Out-of-Order Execution
Instructions executed as resources available, not strictly sequential.
Speculative Execution
CPU guesses branch outcomes, executes instructions ahead to reduce stalls.
References
- Hennessy, J.L., & Patterson, D.A. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2019, pp. 45-120.
- Patterson, D.A., & Hennessy, J.L. Computer Organization and Design: The Hardware/Software Interface. Elsevier, 2017, pp. 30-90.
- Tanenbaum, A.S. Structured Computer Organization. Pearson, 2016, vol. 5, pp. 100-140.
- Stallings, W. Computer Architecture and Organization. Pearson, 2018, vol. 9, pp. 150-200.
- Smith, J.E. "Decoupled Access/Execute Computer Architectures." ACM Transactions on Computer Systems, vol. 2, no. 4, 1984, pp. 289-308.