-
Section 4.5 describes techniques such as
loop unrolling
,
software pipelining
and
trace scheduling
that the compiler can use to uncover ILP.
-
This works as long as the behavior of branches is fairly predictable.
-
We'll now cover methods of increasing the ILP in programs.
-
These methods include
conditional execution
and
speculative execution
.
-
Conditional instructions
-
A conditional instruction refers to a condition which is evaluated as part of the instruction execution, i.e.,
-
Rather than use a branch to skip a single instruction, the CPU always executes the instruction but writes the result only if the condition is met.
-
Eliminating the branch gives two benefits:
-
The branch is not executed, reducing the instruction count by 1.
-
The branch delay is avoided.
-
A conditional branch changes a
control
dependence into a
data
dependence.
-
This is beneficial because, in an integer pipeline, data dependences rarely cause stalls while control hazards do cause stalls.
-
Other benefits:
-
Conditional instructions help a lot with
superscalar
machines because such machines suffer even more from branch stalls.
-
This is true because conditional instructions can be scheduled as normal instructions.
-
Branches, on the other hand, often
cannot
be scheduled this way because they may cause a change in the instruction stream.
-
This allows more slots in a superscalar machine to be filled.
-
Conditional instructions are of even greater benefit in this respect on a VLIW machine.
-
Exceptions:
-
We must ensure a speculated instruction does not introduce an exception.
-
The instruction must have NO effect if the condition is not satisfied.
-
If R10 contains
zero
, then it is likely that the LW instruction will cause a protection violation if allowed to execute.
-
In DLX, memory accesses are not started until MEM.
-
Therefore, it is easy to evaluate the condition (i.e. during EX) and prevent this from happening in this case.
-
Limits to conditional instructions:
-
Executing them takes time.
-
A conditional instruction always requires time, even if the instruction is annulled.
-
Moving an instruction across a branch is essentially speculating on the outcome of the branch.
-
This may slow down a program if an instruction is executed but turned into a no-op, since another instruction may have executed during that slot.
-
They are always a win when the cycle that they occupy would have been idle anyway.
-
Trading a
branch
and
move
for a
conditional move
is usually a win.
-
Longer sequences may not be.
-
Limits to conditional instructions:
-
The condition must be evaluated early
.
-
As noted above, the condition must be known before the processor's state is changed, and the earlier the better.
-
Conditional instructions are difficult for multiple conditions.
-
These instructions work well for avoiding single branches.
-
However, the task is more difficult for two or more branch options since it requires additional instructions to compute the logical combination of the conditions.
-
Conditional instructions may impose a speed penalty.
-
This can be in one of two forms:
-
The cycle time for the entire CPU can be increased, or
-
A conditional instruction may take more clock cycles to execute than a non-conditional instruction.
-
Conditional instructions are effective in eliminating control dependencies for small if-then blocks, as we have seen.
-
However, a more significant performance gain can be attained by moving
larger
blocks of code across (before) branches (see trace scheduling in Section 4.5.)
-
This can create problems in two areas:
-
Registers that should not be modified (because of the branch) are modified anyway.
-
Similar to conditional instructions, exceptions that should not occur are possible.
-
Note that resumable exceptions (page faults) are not a problem if they occur in speculative code.
-
They may cause performance to suffer a bit, but correct programs are not terminated.
-
Non-resumable (terminating) exceptions are a problem and must be handled.
-
Three schemes
for supporting more ambitious speculation without introducing erroneous exception behavior have been investigated:
-
Ignore exceptions
-
The simplest method for speculation is for the CPU and OS to ignore
non-resumable
exceptions for speculative instructions.
-
Rather than terminate the program, they return an undefined value for the instruction causing the exception.
-
If the exception generating instruction was
not speculative
, the program is in error but it is allowed to continue !
-
But it will probably generate incorrect results.
-
If the exception generating instruction was
speculative
, the speculative result will
not
be used and the program will run properly.
-
Either way, a
correct
program is
not
terminated improperly.
-
Ignore exceptions: An example:
-
Three schemes (continued):
-
Poison bits
-
Each register has a "poison bit" attached to it.
-
If a
speculative
instruction causes an exception, the exception is handled by setting the poison bit of its destination register.
-
If another
speculative
instruction uses a poisoned register as a source operand, its destination register poison bit is also set.
-
If a
non-speculative
instruction uses a poisoned register, an exception is generated.
-
It may, however, write to a poisoned register.
-
If this occurs, the poison bit is cleared.
-
This method generates exceptions for incorrect programs (at about the right place.)
-
The complication is that the OS must be able to save, restore, and reset the poison bits, which requires special instructions.
-
Three schemes (continued):
-
Speculative instructions with renaming (buffering results).
-
Note that we had to introduce register copies in the previous schemes.
-
This approach (called
boosting
) provides
renaming
and
buffering
in the hardware (similar to Tomasulo's algorithm.)
-
A
boosted
instruction is executed speculatively based on a branch.
-
Its results are forwarded to and used by other boosted instructions.
-
When the branch is reached, if the prediction is correct, the results are committed to the register file.
-
Therefore, instructions that are control dependent on a branch can be executed
before the branch
.
-
Speculative instructions with renaming:
-
An example:
-
We now examine the combination of
speculative execution
and
dynamic scheduling
based on Tomasulo's algorithm.
-
We focus on floating-point operations but a similar structure can handle integer operations.
-
In order to support speculation, a change is necessary to Tomasulo's approach:
-
We must separate the process of
completing execution
and the bypassing of results among instructions
from
instruction commit
(reg file or memory update).
-
This allows other (speculative) instructions to execute, but no results are committed until we know the instruction is no longer speculative.
-
We will allow instructions to execute out of order but force them to commit in order, which helps with handling exceptions properly.
-
A set of hardware buffers (
Reorder buffers
) will be used to hold the results of instructions that have finished execution but have
not committed
.
-
The
reorder buffer
provide additional
virtual
registers and is a source of operands for instructions.
-
An additional step is added to Tomasulo's algorithm, as follows:
-
Issue
-
Get a floating-point instruction, and issue it if there is a reservation station open
and an empty slot in the reorder buffer
.
-
Send the number of the reorder buffer assigned for the result to the reservation station so it can be used to tag the result.
-
Execute
-
Monitor the CDB while waiting for source registers to be ready.
-
When both operands are available, perform the operation.
-
Write result
-
Write the result on the CDB with the reorder buffer tag.
-
The result is stored into the reorder buffer as well as into any reservation stations waiting for the result.
-
The reorder buffer can also serve as a source register for operands similar to the CDB.
-
Commit
-
When the instruction reaches the head of the reorder buffer and its result is present in the buffer, update the register with the result (or write memory).
-
When an incorrectly predicted branch arrives, flush the reorder buffer and restart execution at the correct successor of the branch.
-
If the branch was correctly predicted, do nothing.
-
This scheme has several advantages over dynamic scheduling alone.
-
First, instructions can "finish" out of order as long as they are not committed.
-
This means that the CPU can keep
precise interrupts
even while executing out of order since changes are committed in order.
-
Second, this method allows the
CPU
to
speculatively
execute instructions past a branch (but before the branch is executed), subsequently cancelling them if the branch is mispredicted.
-
Exception handling
-
Exceptions in this model are handled just before the instruction is ready to commit.
-
At that time, all previous instructions have committed and all later instructions have not committed.
-
Thus, the CPU can do a
precise exception
even if execution occurs out of order.
-
Speculation in multiple-issue CPUs:
-
The techniques that work in single-issue CPUs work in multiple-issue CPUs as well.
-
In fact, they may be more useful in such processors because of the longer delays and the greater need for speculation to fill empty slots.
-
Lower CPI is not always faster
-
If the lower CPI comes at the expense of a longer clock cycle, it may slow the processor down.
-
This is almost invariably true since lowering CPI using hardware means implementing more sophisticated techniques, which increase clock cycle time.
-
However, this inclination arises because:
-
Simulation tools to evaluate the impact of enhancements that affect CPI are
more readily available
than tools to evaluate the impact on clock cycle time.
-
This is true largely because an accurate analysis on the impact of clock rate is not possible until the design is well underway.
-
Improve all parts of a multiple-issue CPU, not just one
-
As with uniprocessors, improving one aspect of a CPU does not help unless it was the bottleneck from the beginning.
-
For example, improving FP latency for a multiple-issue CPU does not help much unless something is done about branching.
-
Speculative execution is great but is of limited benefit unless there are additional registers to use (either implicitly or under compiler control).