-
Now that we've seen how to
estimate
performance on an I/O system.
-
And how to actually
measure
performance.
-
We are ready to talk about how to build one.
-
The objective is to find a design that is expandable and that meets goals for
cost
and
variety of devices
while
avoiding bottlenecks
to I/O performance.
-
In designing an I/O system, analyze
performance
,
cost
and
capacity
using
various I/O connection schemes
and
different numbers
of I/O devices of each type.
-
Here are the steps to follow in designing an I/O system:
-
List the types of I/O devices and buses, and their costs.
-
List the physical requirements of each device.
-
These include volume, power, connectors, bus slots, etc.
-
This won't be a problem for paper examples, but it certainly will be for real systems.
-
Figure out the CPU resource demands for each I/O device.
-
Clock cycles to initiate, support the operation of, and complete requests.
-
Clock stalls from I/O access to memory.
-
Clock cycles to recover from an I/O activity, such as a cache flush.
-
List memory and I/O bus resource demands for each device
.
-
Bandwidth of main memory and the I/O bus can often be a bottleneck, particularly when many devices are connected to a single bus.
-
Compute performance for various configurations of devices, buses, etc
.
-
This can really only be done by building the system and measuring it.
-
If this is not feasible (which is usually the case), the next best thing is a detailed simulation.
-
Queuing models can be used to get a rough estimate of performance.
-
Remember, performance can be measured as:
-
Megabytes per second.
-
I/Os per second.
-
This is dependent on the needs of the applications.
-
The goals for the design should also be clear:
-
Is it a design to maximize performance at any cost ?
-
Is it the cheapest system that will satisfy minimum requirements ?
-
Is it the best price/performance ?
-
Look over the examples in the text !
-
Many modern computer systems now use
removable media
to store their data.
-
Advantages:
-
Inexpensive
-
Removable media cost a lot less because you only pay once for the machinery to read, write, and transport the medium.
-
Since removable media use similar technology to non-removable media, the media costs are similar but the mechanism cost is much lower.
-
Low power
-
Disks use power to rotate.
-
A major advantage of removable media is that they do not consume power.
-
Unerasability
-
Some removable media (WORM) can not be erased even if the system requests it.
-
Components:
-
Tape robots
-
These systems typically hold thousands of tape cartridges and can load any cartridge in under 20 seconds.
-
IBM 3490 cartridges (the most common today) hold 9 GB of data per cartridge and transfer at 9 MB/sec.
-
Optical disk jukeboxes
-
These are usually used for two purposes:
-
Small randomly-accessed data.
-
Data that should not ever be overwritten (even accidentally).
-
The cost per GB is higher than for tape, but seek time is much better.
-
How it works:
-
Moving data from disk to tertiary
-
Data is moved from disk to tertiary storage when the disk gets full.
-
This is called
file migration
or simply
migration
.
-
Migrating data is done to free up disk space.
-
Files are picked for migration according to several factors:
-
How big they are.
-
When they were last used.
-
Cost to retrieve the file.
-
Other factors (possibly file type and/or user).
-
The device to which the file is migrated can also depend on these factors.
-
Small files might go to optical disk while large files are sent to a tape robot.
-
Migration can also occur from one tertiary storage device to another.
-
Moving data from tertiary to disk
-
Data is moved from tertiary to disk when it is needed.
-
It may also be prefetched if the system believes that the file might be used soon.
-
File migration issues
-
Tertiary storage is very much an
ad hoc
art these days.
-
System designers build their systems based on what others have done because there is relatively little concrete research on what works and what does not.
-
File migration issues
-
When should a file be migrated ?
-
How should the system choose the files to move from disk to tape ?
-
This is very important because a user notices even a single miss.
-
It may take close to a minute to retrieve a file from tape !
-
Migrating just one file that should not have been moved can adversely impact a user's session.
-
When should a file be deleted from disk ?
-
Just because a file has been migrated to tape does not mean it should be deleted from disk.
-
When should its space be reclaimed ?
-
File migration issues
-
When should a file be moved from tape to disk ?
-
For demand fetches, this is obvious.
-
However, there is also
prefetching
and
clustering
that might help improve performance.
-
What kinds of devices and layouts work best for various kinds of files ?
-
Again, clustering is important, as is transfer time versus time to first byte.
-
Media cost is not the same as actual cost.
-
Just because a disk costs $0.20/MB does
NOT
mean you can pay $200,000 and get a terabyte of disk.
-
There are lots of other costs associated with I/O systems.
-
For example, disks need controllers, I/O buses, system buses, power supplies, and mounting hardware.
-
This supporting structure becomes much more expensive as the number of disks supported grows.
-
The same is true for removable media.
-
It only costs $1000 to buy a terabyte of magnetic tape.
-
But that does not include tape readers, a robot, software, and all the other pieces necessary to build a full system.
-
Disk seek time is not linear.
-
A disk head must accelerate to maximum velocity, travel across tracks, decelerate, and settle.
-
Most of the head's time is spent accelerating and decelerating, a non-linear activity.
-
Disk seek time is not linear.
-
For disks with more than 200 cylinders, Chen and Lee [1995] modeled the seek distance as:
-
The curve represented by this model in shown on the previous slide in red.
-
I/O will become more important with time.
-
As our society stores more and more information on computer media, the ability to get to that information will become ever more important.
-
The NASA Mission to Project Earth will capture more than a terabyte of data per day from satellites.
-
How can we find a needle in that haystack ?
-
Similarly, future libraries may dispense with physical books and instead keep information online.
-
This makes it easier to distribute the information, but getting data to and from storage will be a bottleneck unless progress is made.