Digital Design and Computer Architecture - Lecture 15a: out of order execution

This post is a derivative of Digital Design and Computer Architecture Lecture by Prof. Onur Mutlu, used under CC BY-NC-SA 4.0.

You can watch this lecture on Youtube and see pdf.

I write this summary for personal learning purposes.

Required Readings

  • Out-of-order execution
    • H&H, Chapter 7.8-7.9
  • Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995
    • More advanced pipelining
    • Interrupt and exception handling
    • Out-of-order and superscalar execution concepts
  • Optional
    • Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro 1999.

Agenda for Today & Next Few Lectures

  • Out-of-Order Execution
  • Problem in in-order pipeline
    • A true data dependency stalls dispatch of younger instructions into functional (execution) units
    • What if there were some other instructions and later in the program executionorder that didn’t need that value or that didn’t need any value that’s produced by any instruction in the pipeline?
    • Can we do better?
  • Solutions
    • Out-of-order dispatch (scheduling, or execution)
    • Compile-time instruction scheduling/reordering
    • Value prediction
    • Fine-grained multithreading

Out-of-Order Execution (Dynamic Instruction Scheduling)

  • Idea: Move the dependent instructions out of the way of independent ones (s.t. independent ones can execute)
    • Rest areas for dependent instructions: Reservation stations
  • Monitor the source “values” of each instruction in the resting area
  • When all source “values” of an instruction are available, “fire” (i.e. dispatch) the instruction
    • Instructions dispatched in dataflow (not control-flow) order
  • Benefit:
    • Latency tolerance: Allows independent instructions to execute and complete in the presence of a long-latency operation

Enabling OoO Execution

  1. Need to link the consumer of a value to the producer
    1. Register renaming: Associate a “tag” with each data value How do you link the consume of value to a producer? Do register renaming. You have valid value and the reorder buffer entry. Let’s call that reorder buffer entry a “tag”. We rename a register to a tag and the tag identifies the instruction that’s going to produce the value. Use a tag to identify the instruction that’s going to produce the value.
  2. Need to buffer instructions until they are ready to execute
    1. Insert instruction into reservation stations after renaming
  3. Instructions need to keep track of readiness of source values
    1. Broadcast the “tag” when the value is produced
    2. Instructions compare their “source tags” to the broadcast tag. If match, source value becomes ready
  4. When all source values of an instruction are ready, need to dispatch the instruction to its functional unit (FU)
    1. Instruction wakes up if all sources are ready
    2. If multiple instructions are awake, need to select one per FU

Two Humps in a Modern Pipeline

Fetch -> Decode -> [Schedule] -> Execution -> [Reorder] -> Writeback

Only the execution stage is out of order.

  • Hump 1: Reservation stations (scheduling window)
  • Hump 2: Reordering (reorder buffer, aka instruction window or active window)

Tomasulo’s Algorithm

Whenever we are decoding an instruction, we check if there is a reservation stations available before renaming. If there is no available reservation stage to put this instruction, we stall.

  • If reservation station available before renaming
    • Instruction + renamed operands (source value/tag) inserted into the reservation station
    • Only rename if reservation station is available
  • Else stall
  • While in reservation station, each instruction:
    • Watches common data bus (CDB) for tag of its sources
    • When tag seen, grab value for the source and keep it in the reservation station
    • When both operands available, instruction ready to be dispatched
  • Dispatch instruction to the Functional Unit when instruction is ready
  • After instruction finishes in the Functional Unit
    • Arbitrate for CDB
    • Put tagged value onto CDB (tag broadcast)
    • Register file is connected to the CDB
      • Register contains a tag indicating the latest writer to the register
      • If the tag in the register file matches the broadcast tag, write broadcast value into register (and set valid bit)
    • Reclaim rename tag
      • no valid copy of tag in system!

An Exercise

Lecture

Leave a comment