Introduction

Pipelining is a central technique in computer architecture that improves instruction throughput by decomposing execution into discrete stages operating concurrently. Each stage processes part of an instruction in parallel with other stages processing different instructions. This method increases CPU utilization and overall performance without increasing clock speed or hardware complexity linearly.

"Pipelining is the backbone of modern high-performance processors, enabling multiple instructions to be processed simultaneously and efficiently." -- David A. Patterson & John L. Hennessy

Pipeline Concept

Definition

Pipeline: a sequence of processing stages arranged so that output of one stage feeds input of the next. Each stage executes a portion of the instruction cycle.

Analogy

Comparable to an assembly line: multiple items processed simultaneously at different stages, increasing throughput.

Basic Operation

Instructions enter pipeline sequentially. At any clock cycle, multiple instructions reside in different stages, enabling parallelism at instruction level.

Pipeline Stages

Common Five-Stage Pipeline

Stages: Instruction Fetch (IF), Instruction Decode/Register Fetch (ID), Execute (EX), Memory Access (MEM), Write Back (WB).

Instruction Fetch (IF)

Retrieve instruction from memory address held in program counter (PC).

Instruction Decode (ID)

Decode instruction opcode. Read source registers. Calculate branch targets.

Execute (EX)

Perform ALU operations, address calculation, or evaluate branch condition.

Memory Access (MEM)

Access data memory for load/store instructions.

Write Back (WB)

Write results back to register file.

StageFunction
IFFetch instruction from memory
IDDecode instruction and read registers
EXExecute ALU operations or calculate addresses
MEMAccess data memory
WBWrite results to registers

Pipeline Performance Metrics

Throughput

Number of instructions completed per unit time. Increased by overlapping stages.

Latency

Time for a single instruction to pass through all stages. Remains roughly constant with pipelining.

Speedup

Ratio of execution time without pipelining to execution time with pipelining. Ideal speedup approaches number of pipeline stages.

Pipeline Efficiency

Percentage of pipeline utilization. Influenced by hazards and stalls.

Formula for Speedup

Speedup = (Time per instruction without pipeline) / (Time per instruction with pipeline)
Ideal Speedup ≈ Number of pipeline stages (n)

Pipeline Hazards

Definition

Conditions preventing next instruction in pipeline from executing during its designated clock cycle.

Structural Hazards

Resource conflicts when hardware cannot support concurrent operations.

Data Hazards

Data dependency between instructions causing incorrect read/write order.

Control Hazards

Branch instructions causing uncertainty in instruction flow, leading to pipeline flushes or stalls.

Types of Data Hazards

RAW (Read After Write), WAR (Write After Read), WAW (Write After Write).

Hazard Mitigation Techniques

Stalling

Insert NOPs (no-operation instructions) to delay dependent instructions until hazard clears.

Forwarding (Bypassing)

Use hardware paths to feed results directly to dependent stages without writing back to registers.

Branch Prediction

Speculatively execute instructions based on predicted branch outcomes to reduce control hazards.

Delayed Branch

Rearrange instructions to fill branch delay slots, minimizing pipeline flushes.

Register Renaming

Eliminate false data dependencies (WAR and WAW) by dynamically assigning registers.

TechniqueDescriptionEffect
StallingPause pipeline for hazard resolutionSimplest but reduces throughput
ForwardingDirect data transfer between stagesReduces stalls, improves efficiency
Branch PredictionSpeculative execution based on likely branchMinimizes control hazard penalties
Delayed BranchInstruction reordering to fill delay slotsImproves pipeline utilization
Register RenamingDynamic assignment to prevent false dependenciesEnables out-of-order execution

Superscalar and Advanced Pipelines

Superscalar Architecture

Multiple instructions issued per clock cycle via multiple parallel pipelines.

Out-of-Order Execution

Instructions executed as operands become available, not necessarily in program order.

Dynamic Scheduling

Hardware-based instruction reordering to maximize resource utilization.

Speculative Execution

Execute instructions before branch resolution to reduce stalls.

Hyper-Threading

Simultaneous multithreading using pipeline resources shared by multiple threads.

Example: Cycle 1: Issue 2 instructions: I1, I2Cycle 2: I1 in EX stage, I2 in ID stageCycle 3: Issue 2 more instructions: I3, I4 I1 in MEM, I2 in EX, I3 in ID, I4 in IF

Pipeline Implementation Details

Pipeline Registers

Registers between stages hold intermediate data/signals at clock edge to isolate stages.

Clocking

Pipeline operates synchronously with clock cycles triggering stage transitions.

Control Logic

Manages hazards, stalls, forwarding, and branch decisions.

Pipeline Depth

Number of stages; deeper pipelines increase clock frequency but raise hazard complexity.

Pipeline Width

Number of instructions processed simultaneously per stage in wide or superscalar pipelines.

Pipeline Scheduling and Control

Instruction Scheduling

Compiler or hardware arranges instructions to minimize hazards and stalls.

Scoreboarding

Hardware tracking of instruction dependencies to issue instructions safely.

Reservation Stations

Buffers holding instructions waiting for operands to enable out-of-order execution.

Reorder Buffer

Preserves program order during out-of-order execution for precise exceptions.

Hazard Detection Unit

Detects hazards and triggers pipeline control actions like stalls or flushes.

Pipeline Stalls and Bubbles

Definition

Stall: pipeline cycle where no useful work is done. Bubble: inserted NOP to delay pipeline progress.

Causes

Data hazards, structural hazards, control hazards, cache misses.

Effects

Reduce pipeline efficiency and throughput.

Stall Handling

Pipeline control inserts bubbles or stalls fetch stage to resolve hazards.

Minimization Strategies

Forwarding, branch prediction, compiler scheduling.

Real-World Pipeline Examples

MIPS Pipeline

Classic 5-stage pipeline with hazard detection and forwarding.

Intel Pentium Pipeline

Superscalar, 5-stage integer pipeline with dual instruction issue.

ARM Cortex Pipeline

Variable pipeline depths, advanced branch prediction, out-of-order execution.

RISC-V Pipelines

Configurable pipeline stages from simple 3-stage to complex superscalar.

Pipeline Table Summary

ProcessorPipeline DepthFeatures
MIPS R20005 stagesIn-order, forwarding, hazard detection
Intel Pentium5 stagesSuperscalar, dual issue
ARM Cortex-A98+ stagesOut-of-order, branch prediction
RISC-V Rocket5 stagesSimple, in-order pipeline

References

  • D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 5th ed., Morgan Kaufmann, 2013, pp. 293-345.
  • J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed., Morgan Kaufmann, 2019, pp. 245-310.
  • W. Stallings, Computer Organization and Architecture: Designing for Performance, 10th ed., Pearson, 2016, pp. 412-455.
  • Y. N. Patt and S. J. Patel, Introduction to Computing Systems: From Bits & Gates to C & Beyond, 2nd ed., McGraw-Hill, 2010, pp. 360-390.
  • R. K. Gupta and M. L. Soffa, "Pipeline scheduling algorithms," ACM Computing Surveys, vol. 21, no. 4, 1989, pp. 353-395.