- 
 
So far, we have seen that data hazards that prevent instruction issue were hidden by:
- 
 
 Forwarding.
- 
 
 Compiler scheduling that separated dependent instructions.
 
 
- 
 
The latter is referred to as 
static scheduling
. 
 
 
- 
 
Dynamic scheduling
 is also possible:
- 
 
The CPU rearranges the instructions (while preserving dependences) to reduce stalls. 
 
 
- 
 
Dynamic scheduling has several advantages over static:
- 
 
 Handles dependencies that are 
UN
known at compile time (i.e., a memory reference.)
- 
 
 Allows code compiled with one pipeline in mind to run efficiently on a different pipeline.
 
- 
 
Basic problem with pipelining techniques used so far is that they all use 
in-order
 instruction issue.
 
 
- 
 
A stall of an instruction stalls 
all
 instructions behind it.
 
 
- 
 
It is possible that the instructions that follow can issue. For example:
- 
 
Basic 
out-of-order execution
:
- 
 
There is no reason why the CPU can't execute the SUBD before the ADDD.
- 
 
Doing so will reduce the penalty caused by the data stall.
 
 
- 
 
However, this will force 
out-of-order completion
, which causes problems handling exceptions.
 
- 
 
In order to do 
out-of-order execution
, we split the ID stage into two halves.
- 
 
 
Issue
- 
 
Decode instructions and check for structural hazards.
 
 
- 
 
 
Read operands
- 
 
Wait until there are no data hazards, and then read the operands. 
- 
 
Note that we can still use forwarding to remove data hazards.
 
 
- 
 
A design of this type may use an instruction queue to hold instructions that have been fetched but are waiting to be executed.
 
 
- 
 
An instruction is considered to be in 
execution
 at any time that it's in an 
EX
 stage. 
 
 
- 
 
Multiple instructions can be in execution at any given time.
 
- 
 
This technique issues instructions in order (
in-order
 issue.) 
 
 
- 
 
However, instructions can 
bypass
 other waiting instructions in the "read operands" phase. 
 
 
- 
 
It is named after the CDC 6600, which was the first machine to use a scoreboard.
 
 
- 
 
Remember WAR hazards ? 
- 
 
They did not exist on our DLX FP and integer pipelines. 
 
 
 
- 
 
Goals:
- 
 
The goal of this technique (and other dynamic scheduling methods) is to maintain an execution rate of one instruction per cycle.
 
 
- 
 
This can be done by executing each instruction as soon as possible. 
 
 
- 
 
To accomplish this, the CPU must possess either multiple functional units or pipelined functional units (or both).
 
 
- 
 
These are equivalent for the purposes of pipeline control.
 
 
- 
 
Let's assume multiple functional units.
 
- 
 
We'll focus the analysis of scoreboarding on operations involving the FP units.
- 
 
The integer units encounter hazards very rarely. (The integer DLX pipeline only stalls when waiting for a value after a load).
 
- 
 
DLX implementation:
- 
 
Every instruction goes through the scoreboard.
- 
 
The scoreboard determines when an instruction can read its operands and write its results.
- 
 
Therefore, all 
hazard detection
 and 
resolution
 are centralized.
 
 
- 
 
 Pipeline stages of interest: ID stage replaced with two stages:
- 
 
 
Issue (IS)
- 
 
An instruction is issued if:
- 
 
 The functional unit is available and
- 
 
 No other active instruction has the 
same
 destination register. 
 
 
- 
 
This avoids 
WAW
 hazards and 
structural
 hazards. 
 
 
- 
 
For a stall, this causes the buffer between IF and IS to fill.
- 
 
A one-entry buffer fills quickly!
 
- 
 
Pipeline stages of interest: 
- 
 
 
Read Operands (RD)
- 
 
The read operation is delayed until the operands are available.
- 
 
 
- 
 
This means that 
no previously issued but uncompleted
 instruction has the operand as its destination.
- 
 
 
- 
 
This resolves 
RAW
 hazards 
dynamically
.
 
 
- 
 
 
Execution (EX)
- 
 
Notify the scoreboard when completed so the functional unit can be reused. 
- 
 
 
- 
 
This step may take multiple cycles.
 
- 
 
Pipeline stages of interest:
- 
 
 
Write result (WB)
- 
 
The scoreboard checks for 
WAR
 hazards and stalls the completing instruction if necessary. 
 
 
- 
 
In our earlier example, SUBD would be stalled in WB until ADDD reads its operands.
 
 
- 
 
In general, writeback is stalled if:
- 
 
 A 
preceding
 instructions has 
not
 read its operands and 
- 
 
 One of the operands is the 
same
 register as the destination of the 
completing
 instruction.
- 
 
The DLX pipeline is now six cycles long: IF IS RD EX MEM WB. 
 
 
- 
 
Note that forwarding is not used here.
- 
 
But it is not a large penalty since write-back occurs as soon as the result is available.
- 
 
Instructions that do NOT need the MEM phase don't execute it.
 
- 
 
Components of the system: 
- 
 
 
Instruction status
- 
 
This component keeps track of the current stage of each instruction. 
 
 
- 
 
There is one entry for each instruction that has passed the IF stage but has not yet completed.
 
 
- 
 
 
Functional unit status
- 
 
This component holds the status of each functional unit. 
- 
 
 "Busy" indicates whether or not the unit is busy.
- 
 
 "Op" indicates the operation being performed (some functional units can do more than one operation)
- 
 
 F
i
, F
j
 and F
k
 indicate the instruction's source and destination registers. 
- 
 
 Q
j
 and Q
k
 indicate the functional units producing the instruction's source registers
- 
 
 R
j
 and R
k
 indicate whether or not the values are ready.
- 
 
It's these fields that are used to avoid 
WAR
 hazards.
 
- 
 
Components: 
- 
 
 
Register result status
- 
 
This holds the ID of the functional unit that will 
eventually
 write a register. 
 
 
- 
 
If the register is not the destination of an issued instruction, the field will indicate no functional unit.
 
 
 
 
- 
 
Let's look at snapshots of the three components of the scoreboard during execution.
 
- 
 
Handling hazards, and distinguishing between 
RAW
 and 
WAR
 hazards:
- 
 
 
RAW
- 
 
We can detect 
RAW
 hazards by checking to see if a source register is listed in the Register Result Status table. 
- 
 
If it is, we have a RAW hazard. 
 
 
- 
 
If the pending instruction is receiving a value from the current instruction, then one of the pending instruction's R
j
/R
k
 fields is set to 
No
. 
 
 
- 
 
 
WAR
- 
 
Before writing the value, we check to make sure that no 
pending
 instruction is using a 
previous
 value for the register we're about to overwrite.
 
 
 
- 
 
If some pending instruction has already received the value it needs, then R
j
 or R
k
 is set to 
Yes
 and the current instruction must stall (
WAR
).
 
 
- 
 
This is how we distinguish between a 
RAW
 and 
WAR
. 
 
- 
 
Limitations:
- 
 
 
ILP
- 
 
If we can't find independent instructions to execute, scoreboarding (or any dynamic scheduling scheme for that matter) helps very little.
 
 
- 
 
 
Size of the "issued" queue
- 
 
This determines how far ahead the CPU can look for instructions to execute in parallel. 
 
 
- 
 
For now, we assume that a window can
not
 span a branch.
- 
 
In other words, the window includes instructions only within basic blocks.
 
 
- 
 
We'll show how the window can be extended beyond the branch later.
 
- 
 
Limitations:
- 
 
 
Number, types, and speed of the functional units
- 
 
This determines how often a structural hazard results in stall.
 
 
- 
 
 
The presence of antidependences and output dependences
- 
 
WAR
 and 
WAW
 hazards limit the scoreboard more than 
RAW
 hazards. 
 
 
- 
 
RAW
 hazards are problems for any technique.
- 
 
But 
WAR
 and 
WAW
 hazards can be solved in ways other than scoreboards.