Digital Design and Computer Architecture - Lecture 13: Pipelining
This post is a derivative of Digital Design and Computer Architecture Lecture by Prof. Onur Mutlu, used under CC BY-NC-SA 4.0.
You can watch this lecture on Youtube and see pdf.
I write this summary for personal learning purposes.
Agenda for Today
- Pipelining
- Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …
Pipelining
Basic Idea
- Pipeline the execution of multiple instructions
- Analogy: “Assembly line processing” of instructions
- Divide the instruction processing cycle into distinct “stages” of processing
- Process a different instruction in each stage
- Instructions consecutive in program order are processed in consecutive stages
An Ideal Pipeline
- Increase throughput with little increase in cost
- Repetition of identical operations
- The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
- Repetition of independent operations
- No dependencies between repeated operations
- Uniformly partitionable suboperations
- Processing can be evenly divided into uniform-latency suboperations (that do not share resources)
Bandwidth (throughput)
- Nonpipelined version = 1/T
- Pipelined version = N/T
More Realistic Pipeline
Throughput
- Nonepipelined version BW = 1/(T+S) where S = latch delay
- k-stage pipelined version
- BW(k-stage) = 1 / (T/k +S )
- BW(max) = 1 / (1 gate delay + S )
Cost
- Nonpipelined version with combinational cost G, cost = G+L where L = latch cost
- k-stage pipelined version, cost = G + Lk
Issues in Pipeline Design
- Balancing work in pipeline stages
- How many stages and what is done in each stage
- Keeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow
- Handling dependences
- Data
- Control
- Handling resource contention
- Handling long-latency (multi-cycle) operations
- Handling dependences
- Handling exceptions, interrupts
- Advanced: Improving pipeline throughput
- Minimizing stalls
Causes of Pipeline Stalls
Stall is a condition when the pipeline stops moving
- Resource contention
- Dependences (between instructions)
- Data
- Control
- Long-latency (multi-cycle) operations
Resource Contention
Happens when instructions in two pipeline stages need the same resource.
- Solution 1: Eliminate the cause of contention
- Duplicate the resource or increase its throughput
- E.g., use separate instruction and data memories (caches)
- E.g., use multiple ports for memory structures
- Duplicate the resource or increase its throughput
- Solution 2: Detect the resource contention and stall one of the contending stages
- Which stage do you stall?
- Example: What if you had a single read and write port for the register file?
Data dependences
-
Types of data dependences
-
Flow dependence (true data dependence – read after write)
-
Output dependence (write after write)
-
Anti dependence (write after read)
-
-
Which ones cause stalls in a pipelined machine?
- For all of them, we need to ensure semantics of the program is correct.
- Flow dependences always need to be obeyed because they constitute true dependence on a value.
- Anti and output dependences exist due to limited number of architectural registers.
- They are dependence on a name, not a value
- We will later see what we can do about them
Data Dependence Handling
pdf p.44
How to Handle Data Dependences
Six fundamental ways of handling flow dependences
- Detect and wait until value is available in register file
- Detect and forward/bypass data to dependent instruction
- Detect and eliminate the dependence at the software level
- No need for the hardware to detect dependence
- Detect and move it out of the way for independent instructions
- Predict the needed value(s), execute “speculatively”, and verify
- Do something else (fine-grained multithreading)
- No need to detect
Control Dependence
- Question: What should the fetch PC be in the next cycle?
- Answer: The address of the next instruction
- All instructions are control dependent on previous ones. Why?
- If the fetched instruction is a non-control-flow instruction:
- Next Fetch PC is the address of the next-sequential instruction
- Easy to determine if we know the size of the fetched instruction
- If the instruction that is fetched is a control-flow instruction:
- How do we determine the next Fetch PC?
- In fact, how do we know whether or not the fetched instruction is a control-flow instruction?
Data Dependence Handling: Concepts and Implementation
pdf p.55
Leave a comment