-
How do we measure the relative speed of one computer to another ?
-
The user's metric is
execution time
or
response time
.
-
A system administrator's metric is
throughput
.
-
We will measure performance using ratios. i.e. Computer X is
n
times faster than Y is computed as:
-
Since performance is inversely proportional to execution time:
-
Key:
Execution time of real programs
is the only consistent and reliable measure of performance.
-
Execution time can be defined in different ways:
-
Wall-clock time:
-
Latency to complete a task (includes disk accesses, memory accesses, I/O and OS overhead).
-
CPU time:
-
Time CPU is computing (excludes time waiting for I/O and executing other programs).
-
User
verses
System
CPU time: time spent in user/system code.
-
We define two terms:
-
System performance
: elapsed time on an unloaded system.
-
CPU performance
: elapsed
user
CPU time on an unloaded system.
-
Focus of our analysis is on CPU performance.
-
Ideal performance evaluation:
-
A random sample of users running their programs and OS commands.
-
Alternative:
Benchmarks
-
Several types are available (decreasing order of accuracy).
-
Real programs
, i.e. C compilers, TeX and Spice.
-
Kernels
:
-
Small key pieces (usually small) of real programs. i.e. Livermore Loops and Linpack.
-
Used to isolate performance of individual features and help explain behavior of real programs.
-
Toy benchmarks
:
-
Small programs (10-100) lines that produce a known result. i.e. QuickSort
-
Synthetic benchmarks
:
-
Programs that try to "exercise" the system in the same way as the average real program. i.e. Whetstone and Dhrystone.
-
Similar to kernels but are
NOT
real programs !
-
Benchmarks:
-
Suites
:
-
A set of one or more of the above that measure the system's performance across a range of loads.
-
Adv: Weakness of any one benchmark is lessened by the presence of other benchmarks.
-
Examples: SPEC92
-
Includes kernels, program fragments and applications (spice).
-
Running Benchmarks
:
-
Key factor:
Reproducibility
by other experimenters.
-
Details, details, and more details !!! List all assumptions and conditions of your experiments.
-
i.e. program input, version of the program, version of the compiler, optimization level, OS version, main memory size, disk types, etc.
-
Problem:
-
You are trying to reduce a bunch of numbers to a single number that indicates performance. Non-trivial in some cases.
-
Solution:
-
Use execution times (or throughputs). i.e. Machine
B
is 5.5 times faster than machine
A
on programs P1 and P2.
-
These track total execution time (the final measure of performance).
-
-
The average execution time of
several
programs can be measured using the
Arithmetic mean
:
-
For rates, arithmetic mean is called
harmonic mean
. We invert the rates, average them together, and invert the result since rate = 1/time.
-
Weighted
execution time:
-
Use for situations in which the programs in the workload are NOT run an equal number of times.
-
Simply assign a weight, w
i
, to each program to capture its relative frequency in the workload.
-
Performance is then obtained as
weighted arithmetic mean
:
-
And
weighted harmonic mean
:
-
A second approach to an unequal mixture of programs in the workload:
-
Normalize execution time to a reference machine (i.e. VAX-11/780).
-
Take the average of the normalized execution times.
-
Execution time ratio
is the execution time normalized to the reference machine.
-
Disadvantage of this metric is that it violates our fundamental principle of performance measurement: It does NOT predict execution time ! See the example in the text.
-
Ideal solution
: Measure a real workload and weight the programs according to their frequency of execution.