Quantitative Computer Design

Quantitative Computer Design

Principles that are useful in design and analysis of computers:
Make the common case fast !

If a design trade-off is necessary, favor the frequent case (which is often simpler) over the infrequent case.
For example, given that overflow in addition is infrequent, favor optimizing the case when no overflow occurs.

Objective:
Determine the frequent case.
Determine how much improvement in performance is possible by making it faster.

Amdahl's law can be used to quantify the latter given that we have information concerning the former.

Quantitative Computer Design

Amdahl's Law :

The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.

Amdahl's law defines the speedup obtained by using a particular feature:

Two factors:

Fraction _enhanced : Fraction of compute time in original machine that can be converted to take advantage of the enhancement.

Always <= 1.

Speedup _enhanced : Improvement gained by enhanced execution mode:

Quantitative Computer Design

Amdahl's Law (cont) :

Execution time using original machine with enhancement:

Speedup _overall using Amdahl's Law:

Quantitative Computer Design

Amdahl's Law (cont) : An example:

Consider an enhancement that takes 20ns on a machine with enhancement and 100ns on a machine without. Assume enhancement can only be used 30% of the time.

What is the overall speedup ?

Quantitative Computer Design

Amdahl's Law expresses the law of diminishing returns . i.e.

Assume the first improvement costs us $1,000.

Assume we are thinking about spending $100,000 to speed up the 30% by a factor of 500.

Is this a worthy investment ? (will we get a 52%*100 fold increase in performance)?

NO ! The best that we can do: 1/0.7 = 1.42 !

Quantitative Computer Design

CPU Performance Equation:

Often it is difficult to measure the improvement in time using a new enhancement directly.

A second method that decomposes the CPU execution time into three components makes this task simpler.

CPU Performance Equation:

where, for example, Clock cycle time = 2ns for a 500MHz Clock rate.

Quantitative Computer Design

CPU Performance Equation:

An alternative to "number of clock cycles" is "number of instructions executed" or Instruction Count ( IC ).

Given both the "number of clock cycles" and IC of a program, the average Clocks Per Instruction ( CPI ) is given by:

Therefore, CPU performance is dependent on three characteristics:

Note that CPU time is equally dependent on these, i.e. a 10% improvement in any one leads to a 10% improvement in CPU time.

Quantitative Computer Design

CPU Performance Equation :

One difficulty: It is difficult to change one in isolation of the others:

Clock cycle time: Hardware and Organization.
CPI: Organization and Instruction set architecture.
Instruction count: Instruction set architecture and Compiler technology.

A variation of this equation is:

where IC _i represents number of time instruction i is executed in a program and CPI _i represents the average number of clock cycles for instruction i.

Why isn't CPI _i a constant ? (Hint: cache behavior).

Key adv: It is often possible to measure the constituent parts of the CPU performance eq., unlike the components of Amdahl's eq.

Fallacies and Pitfalls

MIPS (million instruction per second) is NOT an alternative metric to time.

The implication: the bigger the MIPS, the faster the machine.

3 problems with MIPS:
MIPS is dependent on the instruction set. This makes it difficult to compare across platforms.
MIPS varies between programs on the same computer.
MIPS can vary inversely to performance !

Classic example: MIPS rating on a machine with floating point hardware may be LOWER than machine with emulation because integer instructions take fewer clock cycles to execute. But there are a lot more of them!

What is important is how much work gets done .

MIPS is typically used to measure PEAK performance, not real performance.

Fallacies and Pitfalls

Synthetic benchmarks (i.e. Whetstone and Dhrystone) aren't necessarily much good for several reasons:
These benchmarks contain code sequences that are not typically found in real code. Therefore,

Compilers can perform optimizations on this code artificially inflating performance.
They don't reflect the behavior of real programs.

Synthetic benchmarks often fit into cache; real code doesn't always do this.

Peak Performance != observed performance .
Peak performance is performance that the machine is "guaranteed not to exceed".
A machine is rarely able to run at peak performance for any extended period of time on real programs.
Just because a processor can run at 300 MFLOPS doesn't mean it always can (pipeline hazards, memory access and the range of CPIs can slow down the CPU).