-
Tomasulo's approach
:
-
A technique to allow execution
to proceed in the presence of hazards
.
-
This was first introduced in the IBM 360/91.
-
Applied only to floating-point operations (including FP loads & stores).
-
We have already seen that the compiler can rename registers (statically) to avoid
WAW
and
WAR
hazards.
-
Tomasulo's scheme performs this function dynamically.
-
It
buffers
operands of instructions waiting to issue, fetching them as soon as they are available, avoiding the register file.
-
The register specifiers of instructions are renamed to reservation station numbers as they are issued,
eliminating
WAW
and
WAR
hazards.
-
Differences between scoreboarding and Tomasulo's approach:
-
Register renaming
-
Register renaming is used to
eliminate
WAR
and
WAW
hazards.
-
In contrast to scoreboarding which must wait for WAR and WAW hazards to clear.
-
Distributed control
-
Hazard detection and execution control are distributed to each functional unit.
-
In contrast to scoreboarding, in which it is centralized.
-
Common Data Bus
-
Is used to forward results directly to the functional units without going through the register file.
-
Reservation stations:
-
The reservation stations are the heart of Tomasulo's approach.
-
They are located at each functional unit and determine when an instruction can begin execution.
-
Operation steps:
-
Issue
-
Take an instruction from the FP operation's queue.
-
If there's a station available for it, send the instruction to the station.
-
Otherwise, stall for a
structural
hazard.
-
Also, this step checks to see if the source operands will be produced by a current instruction.
-
If so,
renaming
is performed by checking to see if the desired register is being written by an instruction already at a reservation station.
-
If the value is not being generated by a functional unit, it is fetched from the register file.
-
If the operation is a load or a store, it can issue if there is an available load or store buffer.
-
Operation steps:
-
Execute
-
If at least one operand is missing, monitor the CDB until it is generated.
-
When a needed operand is put out onto the CDB, it is placed into the appropriate reservation station.
-
When both operands are ready, the operation is executed.
-
RAW
hazards are handled here.
-
Write result
-
When the result is ready, write it on the CDB and into the register file and any waiting reservation station.
-
Obviously, only one value can be written on the CDB in any single cycle.
-
Also, indicate that the reservation station is no longer busy.
-
The reservation station fields:
-
Operation (Op)
-
The operation to be performed.
-
Operand sources (Q
j
, Q
k
)
-
The reservation stations that will produce the values for the two operands.
-
A 0 in either slot means the source operand is already in
V
j
or
V
k
, or that the slot is not needed.
-
Operand values (V
j
, V
k
)
-
The values for the two operands.
-
They are valid iff the corresponding Q is 0.
-
Busy
-
Indicates the reservation station and the accompanying functional unit are busy.
-
Other components:
-
The register file and store buffer have a field for all registers:
Q
i
.
-
This is the number of the reservation station that contains the operation that will eventually write the register/buffer.
-
If no operation is pending, this value is 0 (blank).
-
Consider the same code sequence:
-
The information in the instruction status table is actually distributed in the hardware.
-
Two major advantages:
-
The distribution of the hazard detection logic.
-
If multiple instructions are waiting on the second of two operands, the instructions can be released simultaneously by a broadcast on the CDB.
-
The elimination of WAW and WAR hazards is possible because:
-
Register renaming
is performed using the reservation stations.
-
Operands are stored into the reservation tables as soon as they are available.
-
In the example, the
WAR
hazard was eliminated because the reservation station held the value of F6 for the DIVD instruction.
-
If the `LD F6, 34(R2)' had not completed before the DIVD had issued, the
WAR
hazard (and the possible
WAW
hazard) is still eliminated since
Q
k
would point to the Load1 reservation table for the value of F6.
-
Loop unrolling is performed dynamically !
-
With only 4 FP registers,
WAW
and
WAR
hazards would severly limit loop unrolling, even by the compiler.
-
The
virtual
registers provided by the reservation stations make it possible to execute multiple iterations of some loops simultaneously (see text for an example).
-
Memory disambiguation:
-
Since the store functional unit keeps a memory address as well as a value, it's possible to do disambiguation.
-
When a memory operation is issued, check to see if that location is already involved in an operation.
-
Therefore, LDs and SDs from different iterations of the loop can be executed
non-sequentially
.
-