Definition and Overview

Central Processing Unit

CPU: electronic circuit executing instructions from computer programs. Functions: data processing, control signal generation, arithmetic and logical operations.

Role in Computer System

Coordinates input/output, memory, and storage devices. Acts as "brain" of computer. Executes instructions sequentially or in parallel.

Basic Operation

Fetch-decode-execute cycle: retrieves instruction, interprets opcode, performs operation, stores results.

Types of CPUs

General-purpose CPUs, embedded processors, microcontrollers, digital signal processors (DSPs).

"The CPU is the heart of a computer, orchestrating operations at unimaginable speeds." -- John L. Hennessy

Historical Development

Early Computers

First CPUs: discrete components, vacuum tubes. Example: ENIAC (1940s).

Integrated Circuits

1960s: IC technology enabled complex CPUs on single chips. Reduced cost and size.

Microprocessors

1970s: Intel 4004 introduced first commercial microprocessor. Integrated CPU onto single silicon chip.

Modern Evolution

Multi-core processors, superscalar designs, out-of-order execution, speculative execution innovations.

CPU Architecture

Von Neumann Architecture

Single memory stores data and instructions. Sequential execution. Bottleneck: memory access speed.

Harvard Architecture

Separate memory for instructions and data. Enables parallel access and improved throughput.

RISC vs CISC

RISC: reduced instruction set, simplified decoding, fixed instruction length. CISC: complex instructions, variable length.

Architectural Layers

Instruction set architecture (ISA), microarchitecture, logic design, physical implementation.

Instruction Cycle

Fetch

CPU reads instruction from memory at program counter (PC) address.

Decode

Control unit interprets opcode, identifies operands and operation type.

Execute

ALU or other units perform operation: arithmetic, logic, memory access, control transfer.

Store

Results written back to registers or memory.

Cycle Repeat

PC updated to next instruction address unless branch or interrupt occurs.

Core Components

Arithmetic Logic Unit (ALU)

Performs arithmetic and logic operations.

Control Unit (CU)

Generates control signals, manages instruction sequencing.

Registers

Small, fast storage holding data and addresses.

Buses

Data, address, and control buses facilitate communication within CPU and with memory.

Registers and Their Functions

Program Counter (PC)

Holds address of next instruction.

Instruction Register (IR)

Stores current instruction being executed.

Accumulator (ACC)

Holds intermediate arithmetic results.

General-Purpose Registers

Temporary data storage for operands and results.

Status Register (Flags)

Indicates CPU state: zero, carry, overflow, sign flags.

RegisterFunction
PCNext instruction address
IRCurrent instruction
ACCArithmetic results
General RegistersTemporary data storage
Status RegisterFlags indicating CPU state

Arithmetic Logic Unit (ALU)

Functionality

Performs integer arithmetic: addition, subtraction, multiplication, division. Logical operations: AND, OR, NOT, XOR.

Data Path

Receives operands from registers or immediate values. Outputs results to registers or memory.

Flags Update

Sets CPU flags based on results: zero, carry, overflow, sign.

Design Variants

Combinational logic, pipelined ALUs, multi-bit parallel ALUs.

Operation Example:ADD R1, R21. Fetch operands from R1, R22. ALU adds values3. Store result in destination register4. Update flags accordingly

Control Unit

Role

Decodes instructions, generates control signals to coordinate CPU components.

Types

Hardwired: fixed logic circuitry. Microprogrammed: uses control memory with microinstructions.

Instruction Decoding

Interprets opcode, determines operand addresses and operation sequence.

Timing and Control

Manages clock cycles, synchronizes data flow, manages interrupts.

Pipelining and Parallelism

Pipeline Stages

Typical stages: fetch, decode, execute, memory access, write-back.

Instruction-Level Parallelism (ILP)

Multiple instructions processed simultaneously at different pipeline stages.

Hazards

Structural: resource conflicts. Data: operand dependencies. Control: branch instructions.

Hazard Mitigation

Forwarding, stall cycles, branch prediction techniques.

Hazard TypeDescriptionMitigation
StructuralResource conflictsHardware duplication, stalls
DataOperand dependencyForwarding, stalls
ControlBranch instructionsBranch prediction

Cache Memory and Optimization

Purpose

Stores frequently accessed data to reduce latency and memory bottlenecks.

Levels

L1: smallest, fastest; L2: larger, slower; L3: shared cache in multicore CPUs.

Cache Mapping

Direct-mapped, associative, set-associative caches.

Cache Coherence

Protocols ensure consistency across multiple caches in multicore systems.

Cache Hit Rate = (Number of cache hits) / (Total memory accesses)Optimization goals: maximize hit rate, minimize latency.

Performance Metrics

Clock Speed

Frequency of CPU cycles, measured in GHz.

Instructions Per Cycle (IPC)

Average instructions executed per clock cycle.

Throughput and Latency

Throughput: instructions per second. Latency: time per instruction.

Power Efficiency

Performance per watt critical in mobile and embedded applications.

Benchmarking

Standardized tests (SPEC, Geekbench) measure CPU capabilities.

Instruction Set Architectures

Definition

Set of instructions CPU can execute. Defines programmer-visible behavior.

RISC Characteristics

Fixed-length instructions, load/store architecture, simple addressing modes.

CISC Characteristics

Variable-length instructions, complex addressing, multiple addressing modes.

Examples

RISC: ARM, MIPS, RISC-V. CISC: x86, x86-64.

Instruction Formats

Opcode, operand specifiers, addressing modes, immediate values.

Microarchitecture Variants

Single-Cycle

Each instruction completes in one clock cycle. Simple design, slow clock.

Multi-Cycle

Instructions take multiple cycles, each cycle performs part of instruction.

Superscalar

Multiple instructions issued and executed per cycle using parallel pipelines.

Out-of-Order Execution

Instructions executed as resources available, not strictly sequential.

Speculative Execution

CPU guesses branch outcomes, executes instructions ahead to reduce stalls.

References

  • Hennessy, J.L., & Patterson, D.A. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2019, pp. 45-120.
  • Patterson, D.A., & Hennessy, J.L. Computer Organization and Design: The Hardware/Software Interface. Elsevier, 2017, pp. 30-90.
  • Tanenbaum, A.S. Structured Computer Organization. Pearson, 2016, vol. 5, pp. 100-140.
  • Stallings, W. Computer Architecture and Organization. Pearson, 2018, vol. 9, pp. 150-200.
  • Smith, J.E. "Decoupled Access/Execute Computer Architectures." ACM Transactions on Computer Systems, vol. 2, no. 4, 1984, pp. 289-308.