Performance

Performance

How do we measure the relative speed of one computer to another ?
The user's metric is execution time or response time .
A system administrator's metric is throughput .

We will measure performance using ratios. i.e. Computer X is n times faster than Y is computed as:

Since performance is inversely proportional to execution time:

Key: Execution time of real programs is the only consistent and reliable measure of performance.

Measuring Performance

Execution time can be defined in different ways:
Wall-clock time:

Latency to complete a task (includes disk accesses, memory accesses, I/O and OS overhead).

CPU time:

Time CPU is computing (excludes time waiting for I/O and executing other programs).

User verses System CPU time: time spent in user/system code.

We define two terms:
System performance : elapsed time on an unloaded system.
CPU performance : elapsed user CPU time on an unloaded system.

Focus of our analysis is on CPU performance.

Measuring Performance

Ideal performance evaluation:

A random sample of users running their programs and OS commands.

Alternative: Benchmarks

Several types are available (decreasing order of accuracy).

Real programs , i.e. C compilers, TeX and Spice.
Kernels :

Small key pieces (usually small) of real programs. i.e. Livermore Loops and Linpack.
Used to isolate performance of individual features and help explain behavior of real programs.

Toy benchmarks :

Small programs (10-100) lines that produce a known result. i.e. QuickSort

Synthetic benchmarks :

Programs that try to "exercise" the system in the same way as the average real program. i.e. Whetstone and Dhrystone.
Similar to kernels but are NOT real programs !

Measuring Performance

Benchmarks:
Suites :

A set of one or more of the above that measure the system's performance across a range of loads.

Adv: Weakness of any one benchmark is lessened by the presence of other benchmarks.

Examples: SPEC92

Includes kernels, program fragments and applications (spice).

Running Benchmarks :

Key factor: Reproducibility by other experimenters.

Details, details, and more details !!! List all assumptions and conditions of your experiments.

i.e. program input, version of the program, version of the compiler, optimization level, OS version, main memory size, disk types, etc.

Measuring Performance

Comparing Benchmarks:

Problem:

You are trying to reduce a bunch of numbers to a single number that indicates performance. Non-trivial in some cases.

Solution:

Use execution times (or throughputs). i.e. Machine B is 5.5 times faster than machine A on programs P1 and P2.

Measuring Performance

These track total execution time (the final measure of performance).
The average execution time of several programs can be measured using the Arithmetic mean :

For rates, arithmetic mean is called harmonic mean . We invert the rates, average them together, and invert the result since rate = 1/time.

Measuring Performance

Weighted execution time:

Use for situations in which the programs in the workload are NOT run an equal number of times.

Simply assign a weight, w _i , to each program to capture its relative frequency in the workload.

Performance is then obtained as weighted arithmetic mean :

And weighted harmonic mean :

Measuring Performance

A second approach to an unequal mixture of programs in the workload:

Normalize execution time to a reference machine (i.e. VAX-11/780).
Take the average of the normalized execution times.

Execution time ratio is the execution time normalized to the reference machine.

Disadvantage of this metric is that it violates our fundamental principle of performance measurement: It does NOT predict execution time ! See the example in the text.

Ideal solution : Measure a real workload and weight the programs according to their frequency of execution.