Digital Design and Computer Architecture - Lecture 13: Pipelining

January 9, 2022

This post is a derivative of Digital Design and Computer Architecture Lecture by Prof. Onur Mutlu, used under CC BY-NC-SA 4.0.

You can watch this lecture on Youtube and see pdf.

I write this summary for personal learning purposes.

Agenda for Today

Pipelining
Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …

Pipelining

Basic Idea

Pipeline the execution of multiple instructions
Analogy: “Assembly line processing” of instructions
Divide the instruction processing cycle into distinct “stages” of processing
Process a different instruction in each stage
- Instructions consecutive in program order are processed in consecutive stages

An Ideal Pipeline

Increase throughput with little increase in cost
Repetition of identical operations
- The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
Repetition of independent operations
- No dependencies between repeated operations
Uniformly partitionable suboperations
- Processing can be evenly divided into uniform-latency suboperations (that do not share resources)

Bandwidth (throughput)

Nonpipelined version = 1/T
Pipelined version = N/T

More Realistic Pipeline

Throughput

Nonepipelined version BW = 1/(T+S) where S = latch delay
k-stage pipelined version
- BW(k-stage) = 1 / (T/k +S )
- BW(max) = 1 / (1 gate delay + S )

Cost

Nonpipelined version with combinational cost G, cost = G+L where L = latch cost
k-stage pipelined version, cost = G + Lk

Issues in Pipeline Design

Balancing work in pipeline stages
- How many stages and what is done in each stage
Keeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow
- Handling dependences
  - Data
  - Control
- Handling resource contention
- Handling long-latency (multi-cycle) operations
Handling exceptions, interrupts
Advanced: Improving pipeline throughput
- Minimizing stalls

Causes of Pipeline Stalls

Stall is a condition when the pipeline stops moving

Resource contention
Dependences (between instructions)
- Data
- Control
Long-latency (multi-cycle) operations

Resource Contention

Happens when instructions in two pipeline stages need the same resource.

Solution 1: Eliminate the cause of contention
- Duplicate the resource or increase its throughput
  - E.g., use separate instruction and data memories (caches)
  - E.g., use multiple ports for memory structures
Solution 2: Detect the resource contention and stall one of the contending stages
- Which stage do you stall?
- Example: What if you had a single read and write port for the register file?

Data dependences

Types of data dependences
- Flow dependence (true data dependence – read after write)
- Output dependence (write after write)
- Anti dependence (write after read)
Which ones cause stalls in a pipelined machine?
- For all of them, we need to ensure semantics of the program is correct.
- Flow dependences always need to be obeyed because they constitute true dependence on a value.
- Anti and output dependences exist due to limited number of architectural registers.
  - They are dependence on a name, not a value
  - We will later see what we can do about them

Data Dependence Handling

pdf p.44

How to Handle Data Dependences

Six fundamental ways of handling flow dependences

Detect and wait until value is available in register file
Detect and forward/bypass data to dependent instruction
Detect and eliminate the dependence at the software level
- No need for the hardware to detect dependence
Detect and move it out of the way for independent instructions
Predict the needed value(s), execute “speculatively”, and verify
Do something else (fine-grained multithreading)
- No need to detect

Control Dependence

Question: What should the fetch PC be in the next cycle?
Answer: The address of the next instruction
- All instructions are control dependent on previous ones. Why?
If the fetched instruction is a non-control-flow instruction:
- Next Fetch PC is the address of the next-sequential instruction
- Easy to determine if we know the size of the fetched instruction
If the instruction that is fetched is a control-flow instruction:
- How do we determine the next Fetch PC?
In fact, how do we know whether or not the fetched instruction is a control-flow instruction?

Data Dependence Handling: Concepts and Implementation

pdf p.55

Share on

Twitter Facebook LinkedIn

Juhyung Kim