-
Principles that are useful in design and analysis of computers:
-
Make the common case fast
!
-
If a design trade-off is necessary, favor the frequent case (which is often simpler) over the infrequent case.
-
-
For example, given that overflow in addition is infrequent, favor optimizing the case when no overflow occurs.
-
Objective:
-
Determine the frequent case.
-
Determine how much improvement in performance is possible by making it faster.
-
Amdahl's law can be used to quantify the latter given that we have information concerning the former.
-
Amdahl's Law
:
-
The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.
-
Amdahl's law defines the
speedup
obtained by using a particular feature:
-
Fraction
enhanced
: Fraction of compute time in original machine that can be converted to take advantage of the enhancement.
-
Speedup
enhanced
: Improvement gained by enhanced execution mode:
-
Amdahl's Law (cont)
:
-
Execution time using original machine with enhancement:
-
Speedup
overall
using Amdahl's Law:
-
Amdahl's Law (cont)
: An example:
-
Consider an enhancement that takes
20ns
on a machine with enhancement and
100ns
on a machine without. Assume enhancement can only be used 30% of the time.
-
What is the overall speedup ?
-
Amdahl's Law
expresses the
law of diminishing returns
. i.e.
-
Assume the first improvement costs us $1,000.
-
Assume we are thinking about spending $100,000 to speed up the 30% by a factor of 500.
-
Is this a worthy investment ? (will we get a 52%*100 fold increase in performance)?
-
NO ! The best that we can do: 1/0.7 = 1.42 !
-
CPU Performance Equation:
-
Often it is difficult to measure the improvement in time using a new enhancement directly.
-
A second method that decomposes the CPU execution time into three components makes this task simpler.
-
CPU Performance Equation:
-
where, for example, Clock cycle time = 2ns for a 500MHz Clock rate.
-
CPU Performance Equation:
-
An alternative to "number of clock cycles" is "number of instructions executed" or
Instruction Count
(
IC
).
-
Given both the "number of clock cycles" and IC of a program, the average
Clocks Per Instruction
(
CPI
) is given by:
-
Therefore, CPU performance is dependent on three characteristics:
-
Note that CPU time is equally dependent on these, i.e. a 10% improvement in any one leads to a 10% improvement in CPU time.
-
CPU Performance Equation
:
-
One difficulty: It is difficult to change one in isolation of the others:
-
Clock cycle time: Hardware and Organization.
-
CPI: Organization and Instruction set architecture.
-
Instruction count: Instruction set architecture and Compiler technology.
-
A variation of this equation is:
-
where IC
i
represents number of time instruction i is executed in a program and CPI
i
represents the average number of clock cycles for instruction i.
-
Why isn't CPI
i
a constant ? (Hint: cache behavior).
-
Key adv: It is often possible to measure the constituent parts of the CPU performance eq., unlike the components of Amdahl's eq.
-
MIPS
(million instruction per second) is
NOT
an alternative metric to time.
-
The implication: the bigger the MIPS, the faster the machine.
-
3 problems
with MIPS:
-
MIPS is dependent on the instruction set. This makes it difficult to compare across platforms.
-
MIPS varies between programs on the same computer.
-
MIPS can vary inversely to performance !
-
Classic example: MIPS rating on a machine with floating point hardware may be LOWER than machine with emulation because integer instructions take fewer clock cycles to execute. But there are a lot more of them!
-
What is important is
how much work gets done
.
-
MIPS is typically used to measure PEAK performance, not real performance.
-
Synthetic benchmarks
(i.e. Whetstone and Dhrystone) aren't necessarily much good for several reasons:
-
These benchmarks contain code sequences that are not typically found in real code. Therefore,
-
Compilers can perform optimizations on this code artificially inflating performance.
-
They don't reflect the behavior of real programs.
-
Synthetic benchmarks often fit into cache; real code doesn't always do this.
-
Peak Performance != observed performance
.
-
Peak performance is performance that the machine is "guaranteed not to exceed".
-
A machine is rarely able to run at peak performance for any extended period of time on real programs.
-
Just because a processor can run at 300 MFLOPS doesn't mean it always can (pipeline hazards, memory access and the range of CPIs can slow down the CPU).