Dynamic scheduling

Dynamic scheduling

So far, we have seen that data hazards that prevent instruction issue were hidden by:
Forwarding.
Compiler scheduling that separated dependent instructions.

The latter is referred to as static scheduling .

Dynamic scheduling is also possible:

The CPU rearranges the instructions (while preserving dependences) to reduce stalls.

Dynamic scheduling has several advantages over static:
Handles dependencies that are UN known at compile time (i.e., a memory reference.)
Allows code compiled with one pipeline in mind to run efficiently on a different pipeline.

Dynamic scheduling

Basic problem with pipelining techniques used so far is that they all use in-order instruction issue.

A stall of an instruction stalls all instructions behind it.

It is possible that the instructions that follow can issue. For example:

Basic out-of-order execution :

There is no reason why the CPU can't execute the SUBD before the ADDD.
Doing so will reduce the penalty caused by the data stall.

However, this will force out-of-order completion , which causes problems handling exceptions.

Dynamic scheduling

In order to do out-of-order execution , we split the ID stage into two halves.
Issue

Decode instructions and check for structural hazards.

Read operands

Wait until there are no data hazards, and then read the operands.
Note that we can still use forwarding to remove data hazards.

A design of this type may use an instruction queue to hold instructions that have been fetched but are waiting to be executed.

An instruction is considered to be in execution at any time that it's in an EX stage.

Multiple instructions can be in execution at any given time.

Scoreboarding

This technique issues instructions in order ( in-order issue.)

However, instructions can bypass other waiting instructions in the "read operands" phase.

It is named after the CDC 6600, which was the first machine to use a scoreboard.

Remember WAR hazards ?

They did not exist on our DLX FP and integer pipelines.

Consider:

Scoreboarding

Goals:

The goal of this technique (and other dynamic scheduling methods) is to maintain an execution rate of one instruction per cycle.

This can be done by executing each instruction as soon as possible.

To accomplish this, the CPU must possess either multiple functional units or pipelined functional units (or both).

These are equivalent for the purposes of pipeline control.

Let's assume multiple functional units.

Scoreboarding

DLX implementation:

Functional units:

We'll focus the analysis of scoreboarding on operations involving the FP units.

The integer units encounter hazards very rarely. (The integer DLX pipeline only stalls when waiting for a value after a load).

Scoreboarding

DLX implementation:

Every instruction goes through the scoreboard.
The scoreboard determines when an instruction can read its operands and write its results.
Therefore, all hazard detection and resolution are centralized.

Pipeline stages of interest: ID stage replaced with two stages:
Issue (IS)

An instruction is issued if:

The functional unit is available and
No other active instruction has the same destination register.

This avoids WAW hazards and structural hazards.

For a stall, this causes the buffer between IF and IS to fill.

A one-entry buffer fills quickly!

Scoreboarding

Pipeline stages of interest:
Read Operands (RD)

The read operation is delayed until the operands are available.
This means that no previously issued but uncompleted instruction has the operand as its destination.
This resolves RAW hazards dynamically .

Execution (EX)

Notify the scoreboard when completed so the functional unit can be reused.
This step may take multiple cycles.

Scoreboarding

Pipeline stages of interest:
Write result (WB)

The scoreboard checks for WAR hazards and stalls the completing instruction if necessary.

In our earlier example, SUBD would be stalled in WB until ADDD reads its operands.

In general, writeback is stalled if:

A preceding instructions has not read its operands and
One of the operands is the same register as the destination of the completing instruction.

The DLX pipeline is now six cycles long: IF IS RD EX MEM WB.

Note that forwarding is not used here.

But it is not a large penalty since write-back occurs as soon as the result is available.
Instructions that do NOT need the MEM phase don't execute it.

Scoreboarding

Components of the system:
Instruction status

This component keeps track of the current stage of each instruction.

There is one entry for each instruction that has passed the IF stage but has not yet completed.

Functional unit status

This component holds the status of each functional unit.

"Busy" indicates whether or not the unit is busy.
"Op" indicates the operation being performed (some functional units can do more than one operation)
F _i , F _j and F _k indicate the instruction's source and destination registers.
Q _j and Q _k indicate the functional units producing the instruction's source registers
R _j and R _k indicate whether or not the values are ready.

It's these fields that are used to avoid WAR hazards.

Scoreboarding

Components:
Register result status

This holds the ID of the functional unit that will eventually write a register.

If the register is not the destination of an issued instruction, the field will indicate no functional unit.

Consider the code:

Let's look at snapshots of the three components of the scoreboard during execution.

Scoreboarding

Execution snapshot #1:

Scoreboarding

Execution snapshot #2:

Scoreboarding

Execution snapshot #3:

Scoreboarding

Handling hazards, and distinguishing between RAW and WAR hazards:
RAW

We can detect RAW hazards by checking to see if a source register is listed in the Register Result Status table.

If it is, we have a RAW hazard.

If the pending instruction is receiving a value from the current instruction, then one of the pending instruction's R _j /R _k fields is set to No .

WAR

Before writing the value, we check to make sure that no pending instruction is using a previous value for the register we're about to overwrite.

If some pending instruction has already received the value it needs, then R _j or R _k is set to Yes and the current instruction must stall ( WAR ).

This is how we distinguish between a RAW and WAR .

Scoreboarding

Limitations:
ILP

If we can't find independent instructions to execute, scoreboarding (or any dynamic scheduling scheme for that matter) helps very little.

Size of the "issued" queue

This determines how far ahead the CPU can look for instructions to execute in parallel.

It's called the window.

For now, we assume that a window can not span a branch.

In other words, the window includes instructions only within basic blocks.

We'll show how the window can be extended beyond the branch later.

Scoreboarding

Limitations:
Number, types, and speed of the functional units

This determines how often a structural hazard results in stall.

The presence of antidependences and output dependences

WAR and WAW hazards limit the scoreboard more than RAW hazards.

RAW hazards are problems for any technique.

But WAR and WAW hazards can be solved in ways other than scoreboards.