## Quantitative Computer Design

• Principles that are useful in design and analysis of computers:
• Make the common case fast !
• If a design trade-off is necessary, favor the frequent case (which is often simpler) over the infrequent case.
•
• For example, given that overflow in addition is infrequent, favor optimizing the case when no overflow occurs.

• Objective:
• Determine the frequent case.
• Determine how much improvement in performance is possible by making it faster.

• Amdahl's law can be used to quantify the latter given that we have information concerning the former.

## Quantitative Computer Design

• Amdahl's Law :
• The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.

• Amdahl's law defines the speedup obtained by using a particular feature:

• Two factors:
• Fraction enhanced : Fraction of compute time in original machine that can be converted to take advantage of the enhancement.
• Always <= 1.
• Speedup enhanced : Improvement gained by enhanced execution mode:

## Quantitative Computer Design

• Amdahl's Law (cont) :
• Execution time using original machine with enhancement:

• Speedup overall using Amdahl's Law:

## Quantitative Computer Design

• Amdahl's Law (cont) : An example:
• Consider an enhancement that takes 20ns on a machine with enhancement and 100ns on a machine without. Assume enhancement can only be used 30% of the time.

• What is the overall speedup ?

## Quantitative Computer Design

• Amdahl's Law expresses the law of diminishing returns . i.e.

• Assume the first improvement costs us \$1,000.

• Assume we are thinking about spending \$100,000 to speed up the 30% by a factor of 500.

• Is this a worthy investment ? (will we get a 52%*100 fold increase in performance)?

• NO ! The best that we can do: 1/0.7 = 1.42 !

## Quantitative Computer Design

• CPU Performance Equation:
• Often it is difficult to measure the improvement in time using a new enhancement directly.

• A second method that decomposes the CPU execution time into three components makes this task simpler.

• CPU Performance Equation:

• where, for example, Clock cycle time = 2ns for a 500MHz Clock rate.

## Quantitative Computer Design

• CPU Performance Equation:

• An alternative to "number of clock cycles" is "number of instructions executed" or Instruction Count ( IC ).

• Given both the "number of clock cycles" and IC of a program, the average Clocks Per Instruction ( CPI ) is given by:

• Therefore, CPU performance is dependent on three characteristics:

• Note that CPU time is equally dependent on these, i.e. a 10% improvement in any one leads to a 10% improvement in CPU time.

## Quantitative Computer Design

• CPU Performance Equation :
• One difficulty: It is difficult to change one in isolation of the others:
• Clock cycle time: Hardware and Organization.
• CPI: Organization and Instruction set architecture.
• Instruction count: Instruction set architecture and Compiler technology.

• A variation of this equation is:
• where IC i represents number of time instruction i is executed in a program and CPI i represents the average number of clock cycles for instruction i.

• Why isn't CPI i a constant ? (Hint: cache behavior).

• Key adv: It is often possible to measure the constituent parts of the CPU performance eq., unlike the components of Amdahl's eq.

## Fallacies and Pitfalls

• MIPS (million instruction per second) is NOT an alternative metric to time.
• The implication: the bigger the MIPS, the faster the machine.

• 3 problems with MIPS:
• MIPS is dependent on the instruction set. This makes it difficult to compare across platforms.
• MIPS varies between programs on the same computer.
• MIPS can vary inversely to performance !
• Classic example: MIPS rating on a machine with floating point hardware may be LOWER than machine with emulation because integer instructions take fewer clock cycles to execute. But there are a lot more of them!

• What is important is how much work gets done .

• MIPS is typically used to measure PEAK performance, not real performance.

## Fallacies and Pitfalls

• Synthetic benchmarks (i.e. Whetstone and Dhrystone) aren't necessarily much good for several reasons:
• These benchmarks contain code sequences that are not typically found in real code. Therefore,
• Compilers can perform optimizations on this code artificially inflating performance.
• They don't reflect the behavior of real programs.
• Synthetic benchmarks often fit into cache; real code doesn't always do this.

• Peak Performance != observed performance .
• Peak performance is performance that the machine is "guaranteed not to exceed".
• A machine is rarely able to run at peak performance for any extended period of time on real programs.
• Just because a processor can run at 300 MFLOPS doesn't mean it always can (pipeline hazards, memory access and the range of CPIs can slow down the CPU).