Digital Device Components

Digital Device Components

A simple processor illustrates many of the basic components used in any digital system:

Datapath: The core -- all other components are support units that store either the results of the datapath or determine what happens in the next cycle.

Digital Device Components

Memory:

A broad range of classes exist determined by the way data is accessed:

Read-Only vs. Read-Write
Sequential vs. Random access
Single-ported vs. Multi-ported access

Or by their data retention characteristics:

Dynamic vs. Static

Stay tuned for a more extensive treatment of memories.

Control:

A FSM (sequential circuit) implemented using random logic, PLAs or memories.

Interconnect and Input-Output:

Parasitic resistance, capacitance and inductance affect performance of wires both on and off the chip.
Growing die size increases the length of the on-chip interconnect, increasing the value of the parasitics.

Digital Device Components

Datapath elements include adders, multipliers, shifters, BFUs, etc.

The speed of these elements often dominates the overall system performance so optimization techniques are important.

However, as we will see, the task is non-trivial since there are multiple equivalent logic and circuit topologies to choose from, each with adv./disadv. in terms of speed, power and area.

Also, optimizations focused at one design level, e.g., sizing transistors, leads to inferior designs.

Datapath Operators: Addition/Subtraction

Let's start with addition, since it is very a common datapath element and often a speed-limiting element.

Optimizations can be applied at the logic or circuit level.
Logic-level optimization try to rearrange the Boolean equations to produce a faster or smaller circuit, e.g. carry look-ahead adder.
Circuit-level optimizations manipulate transistor sizes and circuit topology to optimize speed.

Let's start with some basic definitions before considering optimizations:

Datapath Operators: Addition/Subtraction

G(A.B) : (generate)

Occurs when a Co is internally generated within the adder (occurs independent of Ci).

P(A+B) : (propagate)

Indicates that Ci is propagated (passed) to Co.

P'(A XOR B) : (propagate)

Used in some adders for the P term since it can be reused to generate the sum term.

D( A . B ) : (delete)

Ensures that a carry bit will be deleted at C _o .

The Boolean expressions for S and C _o are:

Sum = A.B.C _i + A.B.C _i + A.B.C _i + A.B.C _i = A XOR B XOR C
Carry = A.B + A.C _i + B.C _i

Datapath Operators: Addition/Subtraction

But S and C _o can be written in terms of G and P':

C _o (G, P') = G + P'C _i (or P in this case).
S(G, P') = P' XOR C _i

Note that G and P' are INdependent upon C _i .

(Also, C _o and S can be expressed in terms of delete (D)).

Ripple-carry adder:

The critical path (worst case delay over all possible inputs) is a ripple from lsb to msb .

Datapath Operators: Addition/Subtraction

The delay in this case is proportional to the number of bits, N, in the input words:

t _adder = (N - 1)t _carry + t _sum

where t _carry and t _sum equal the propagation delays from C _i to C _o & S.

One possible worst case bit pattern (from lsb to msb ) is:

A: 00000001; B: 01111111

Convince yourself that this is true.

Note that when optimizing this structure, it is far more important to optimize t _carry than t _sum .

The inverting property of a full adder can be used to achieve this goal:

Datapath Operators: Addition/Subtraction

Thus,

S(A, B, Ci) = S(A, B, Ci)
Co(A, B, Ci) = Co(A, B, Ci)

One possible (un-optimized) implementation:

Datapath Operators: Addition/Subtraction

C _o is reused in the S term as:

Sum = A.B.Ci + (A + B + Ci)Co

Even with some design tricks, e.g., transistors on the critical path, C _i , placed closest to the output and symmetrical design, this implementation is slow.

Datapath Operators: Addition/Subtraction

The load capacitance in previous version on C _o consists of 2 diffusion capacitances (inverter) and 6 (next bit) gate capacitances:

This version increases Co's load to 4 diffusion caps, 2 internal (sum) gate caps plus the 6 (next bit) gate caps.

Datapath Operators: Addition/Subtraction

Serial addition can be used if area is a concern:

In this case, you want equal Sum and Carry delays in order to minimize clock cycle time.

Bit-level pipelining can be used to break the dependency between addition time and the number of bits by inserting FAs between each register bit.

Datapath Operators: Addition/Subtraction

Transmission-gate Adder:

Total transistors is 26! Can reduce to 24 by using an inverter for XNOR (see Weste and Eshraghian for an 18 transistor implementation).

Note: Sum and Carry delay times are approximately equal.

Datapath Operators: Addition/Subtraction

Dynamic Adder Design: np-CMOS adder

Datapath Operators: Addition/Subtraction

Dynamic Adder Design: Manchester Carry-Chain adder.

A chain of pass-transistors are used to implement the carry chain.

Precharge: All intermediate nodes, e.g. Co,0, charged to VDD.
Evaluate: Node Co,k is discharged if there is an incoming carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.

Only 4 diffusion capacitances are present per node but the distributed RC-nature of the chain results in delay that is quadratic with number of bits.

Buffers and/or transistor sizing can be used to improve performance.

Datapath Operators: Addition/Subtraction

Consider the worst case delay of the carry chain:

Elmore delay is given by:

The delay of the RC network is then:
t _p = 0.69(C ₁ R ₁ + C ₂ (R ₁ + R ₂ ) + C ₃ (R ₁ + R ₂ + R ₃ ) + C ₄ (R ₁ + R ₂ + R ₃ + R ₄ ) +

C ₅ (R ₁ + R ₂ + R ₃ + R ₄ + R ₅ ) + C ₆ (R ₁ + R ₂ + R ₃ + R ₄ + R ₅ + R ₆ )

Since R ₆ appears 6 times in the expression, it makes sense to minimize its contribution.

Note that reducing R by a factor, e.g. k , at each stage increases the capacitance by a factor k and increases area.

A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.

Datapath Operators: Addition/Subtraction

Carry-Bypass adder:

Assume A _k and B _k (for k = 1...3) are set such that all P _k (propagate) are high.

In this case, an incoming carry C _i,0 = 1, propagates along the complete chain and C _o,3 = 1.

In other words:

if (P ₀ P ₁ P ₂ P ₃ == 1) then C _o,3 = C _i,0 else either DELETE or GENERATE occurred.

Datapath Operators: Addition/Subtraction

Linear Carry-Select adder:

One way around waiting for the incoming carry is to compute the result of both possible values in advance and let the incoming carry select the correct result.

A Square-Root Carry-Select Adder (delay = O(N ^1/2 )) is constructed by increasing the number of input bits in each block from lsb to msb .

Datapath Operators: Addition/Subtraction

Carry look-ahead adder (avoiding the ripple altogether):

Compute the carries to each stage in parallel.

Note that the low-order terms, e.g., P ₀ and G ₀ , appear in the expression for every bit, making the fanout load large.

Datapath Operators: Addition/Subtraction

Carry look-ahead adder:

One possible implementation without using simple logic gates.

Size and fan-in of the gates limit the size to about four.

Datapath Operators: Addition/Subtraction

Carry look-ahead adder:

Datapath Operators: Addition/Subtraction

The Logarithmic look-ahead adder: O(log ₂ N) delay:

The number of logic levels is proportional to log ₂ N, fan-in is limited and the layout is compact (jigsaw puzzle) (see Rabaey for details).

Datapath Operators: Comparison

Magnitude Comparators :

May be built from an adder, complementer (XOR gates) and a zero detect unit.

Think about the modifications necessary to make it a signed comparator (Hint: A couple of XOR gates).

Datapath Operators: Binary Counters

Asynchronous: Based on the Toggle register.

Not a good choice for performance and testability (with no reset).

Datapath Operators: Binary Counters

Synchronous counter.

Replace AND gate with an adder for up/down counting capability.
Weste and Eshraghian also show a version that can be initialized.

Datapath Operators: Multiplication

Multiplication can be broken down into two steps:
Computation of partial products.
Accumulation of the shifted partial products.

Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel

The parallel form computes the partial products in parallel.

Datapath Operators: Multiplication

Parallel Unsigned Multiplication:

Datapath Operators: Multiplication

Parallel Multiplication:

Datapath Operators: Multiplication

Parallel Signed Multiplication:

Datapath Operators: Multiplication

Parallel Signed Multiplication:

Datapath Operators: Multiplication

Serial Unsigned Multiplication:

Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian.

Datapath Operators: Shifters

Right/Left 1-bit shifter:

Datapath Operators: Shifters

Barrel shifter: