Digital Design and Computer Architecture - Lecture 13: Pipelining

This post is a derivative of Digital Design and Computer Architecture Lecture by Prof. Onur Mutlu, used under CC BY-NC-SA 4.0.

You can watch this lecture on Youtube and see pdf.

I write this summary for personal learning purposes.

Agenda for Today

  • Pipelining
  • Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …

Pipelining

Basic Idea

  • Pipeline the execution of multiple instructions
  • Analogy: “Assembly line processing” of instructions
  • Divide the instruction processing cycle into distinct “stages” of processing
  • Process a different instruction in each stage
    • Instructions consecutive in program order are processed in consecutive stages

An Ideal Pipeline

  • Increase throughput with little increase in cost
  • Repetition of identical operations
    • The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
  • Repetition of independent operations
    • No dependencies between repeated operations
  • Uniformly partitionable suboperations
    • Processing can be evenly divided into uniform-latency suboperations (that do not share resources)

Bandwidth (throughput)

  • Nonpipelined version = 1/T
  • Pipelined version = N/T

More Realistic Pipeline

Throughput

  • Nonepipelined version BW = 1/(T+S) where S = latch delay
  • k-stage pipelined version
    • BW(k-stage) = 1 / (T/k +S )
    • BW(max) = 1 / (1 gate delay + S )

Cost

  • Nonpipelined version with combinational cost G, cost = G+L where L = latch cost
  • k-stage pipelined version, cost = G + Lk

Issues in Pipeline Design

  • Balancing work in pipeline stages
    • How many stages and what is done in each stage
  • Keeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow
    • Handling dependences
      • Data
      • Control
    • Handling resource contention
    • Handling long-latency (multi-cycle) operations
  • Handling exceptions, interrupts
  • Advanced: Improving pipeline throughput
    • Minimizing stalls

Causes of Pipeline Stalls

Stall is a condition when the pipeline stops moving

  • Resource contention
  • Dependences (between instructions)
    • Data
    • Control
  • Long-latency (multi-cycle) operations

Resource Contention

Happens when instructions in two pipeline stages need the same resource.

  • Solution 1: Eliminate the cause of contention
    • Duplicate the resource or increase its throughput
      • E.g., use separate instruction and data memories (caches)
      • E.g., use multiple ports for memory structures
  • Solution 2: Detect the resource contention and stall one of the contending stages
    • Which stage do you stall?
    • Example: What if you had a single read and write port for the register file?

Data dependences

  • Types of data dependences

    • Flow dependence (true data dependence – read after write)

    • Output dependence (write after write)

    • Anti dependence (write after read)

  • Which ones cause stalls in a pipelined machine?

    • For all of them, we need to ensure semantics of the program is correct.
    • Flow dependences always need to be obeyed because they constitute true dependence on a value.
    • Anti and output dependences exist due to limited number of architectural registers.
      • They are dependence on a name, not a value
      • We will later see what we can do about them

Data Dependence Handling

pdf p.44

How to Handle Data Dependences

Six fundamental ways of handling flow dependences

  • Detect and wait until value is available in register file
  • Detect and forward/bypass data to dependent instruction
  • Detect and eliminate the dependence at the software level
    • No need for the hardware to detect dependence
  • Detect and move it out of the way for independent instructions
  • Predict the needed value(s), execute “speculatively”, and verify
  • Do something else (fine-grained multithreading)
    • No need to detect

Control Dependence

  • Question: What should the fetch PC be in the next cycle?
  • Answer: The address of the next instruction
    • All instructions are control dependent on previous ones. Why?
  • If the fetched instruction is a non-control-flow instruction:
    • Next Fetch PC is the address of the next-sequential instruction
    • Easy to determine if we know the size of the fetched instruction
  • If the instruction that is fetched is a control-flow instruction:
    • How do we determine the next Fetch PC?
  • In fact, how do we know whether or not the fetched instruction is a control-flow instruction?

Data Dependence Handling: Concepts and Implementation

pdf p.55

Leave a comment