- 
 
So far we have looked at full complementary logic structures and the ratioed CMOS inverter.
 
 
- 
 
Alternative CMOS logic configurations are also possible when designs are constrained by:
- 
 
 High-speed requirements
- 
 
 Low-power dissipation
- 
 
 Area or density
 
 
- 
 
Full-complementary CMOS 
will always
 function correctly even in the presence of 
noise
 and with 
low power-supply voltages
. "Safeness" of function.
 
 
- 
 
In contrast, alternative CMOS logic configurations can produce incorrect functional behavior as a result of:
- 
 
 Insufficient power supplies or power supply noise.
- 
 
 Noise on gate inputs.
- 
 
 Incorrect ratios in ratioed logic.
- 
 
 Charge sharing or incorrect clocking in dynamic gates.
 
- 
 
Both
 functional
 and 
temporal
 (timing) constraints must be met to ensure correct operation of an integrated logic gate.
 
 
- 
 
When optimizing for speed, there are many more options from which to choose.
 
 
- 
 
Our previous analysis determined that rise/fall time could be approximated by:
 
 
- 
 
The number and size of transistors in series (or parallel) in the pull-down or pull-up path affects 
 . .
- 
 
C
L
 is affected by the size of the transistors in the gate (self-loading), the routing capacitance and the number and size of the driven transistors.
 
 
- 
 
Note this approximation does not consider rise/fall time of the input signal.
 
- 
 
In many designs, many logic paths do not require any special consideration.
 
 
- 
 
Critical paths
: Paths that require attention to the timing details.
 
 
- 
 
Automated design tools such as timing analyzers can automatically identify the slowest paths.
 
 
- 
 
Timing analysis can be performed at the:
- 
 
 Architecture level.
- 
 
 RTL/logic gate level.
- 
 
 Circuit level.
- 
 
 Layout level.
 
 
- 
 
Timing optimization has greatest impact if performed at the 
architecture 
level. This requires knowledge of:
- 
 
 How many gate delays fit into a clock cycle.
- 
 
 How fast addition occurs.
- 
 
 How fast memories access.
 
- 
 
Timing optimization can also be performed at the 
RTL/logic
 level where design is focused on:
- 
 
 Pipelining.
- 
 
The types of gates (INVERTER, NAND/AND, Complex gates, PLAs, etc.)
- 
 
 The fan-in and fan-out of the gates.
 
 
- 
 
Circuit level optimization:
- 
 
 Sizing transistors.
- 
 
 Using alternative forms of CMOS logic.
 
 
- 
 
Layout level optimization:
- 
 
 Critical paths are routed first to keep their interconnect distance small.
- 
 
 Loading capacitance and interconnect resistance considered more extensively, e.g., drain merging, choice of routing layer, etc.
 
- 
 
Fan-in: The number of inputs on a gate.
 
 
- 
 
Fan-out: Total number of gate inputs driven by a gate output.
- 
 
Note: This value is usually expressed in some default gate size such as the number of minimum sized inverter gates.
 
- 
 
How does fan-in affect the speed of the gate?
 
 
- 
 
Previously, we determined that connecting two identical transistors in series will approximately double the rise or fall time of the gate.
 
 
- 
 
Let's consider the worst case rise time for an 
m
-input NAND gate (one p-transistor turns on).
 
- 
 
Can be reformulated in terms of C
g
 as:
- 
 
When q(k) = k, routing cap adds as much load cap as there is gate cap.
- 
 
Might want to increase size of driving transistors to reduce effect of routing cap. (e.g. standard cells)
- 
 
When q(k) = 0.1 -> 0.2k (circuit is dominated by self-loading), no advantage to increasing size of driving transistors. (e.g. custom layout).
 
- 
 
Does this mean that the n-transistors need to be m times wider than the p-transistors?
 
 
- 
 
In general, NANDs are a better choice than NOR gates. Why?
 
 
- 
 
Also, the best speed performance is obtained by using gates where the number of inputs ranges between 2 and 5.
 
- 
 
A classic CMOS trade-off:
 
- 
 
General Guidelines:
- 
 
 Use NAND structures where possible.
- 
 
 Place inverters or small fan-in NAND gates at high fan-out nodes.
- 
 
 Avoid NOR structures in high-speed circuits, particularly with a fan-in greater than four and where fan-out is large.
- 
 
 Use a fan-out below 5-10.
- 
 
 Use minimum-sized gates on high fan-out nodes to minimize load.
- 
 
 Keep rising and falling edges sharp.
- 
 
 When designing with power and area as constraints, remember that large fan-in complementary gates will always work given enough time.
 
- 
 
Guidelines to improve delay:
- 
 
 Size transistors when 
routing
 and 
gate
 capacitance contributes significantly to total load capacitance.
- 
 
 Sizing transistors along series paths can improve delay. In this case, 
increase
 the size of transistors proportional to their distance from the output node.
- 
 
 Order transistors so that the transistor 
closest
 to the 
output
 is the transistor that receives its input signal last.
- 
 
 Manipulate logic expressions to replace large fan-in gates with equivalent circuits composed of gates with smaller fan-ins.
- 
 
 Insert buffers (i.e. an inverter) on the output of large 
fan-in
 gates in order to reduce the output load capacitance.
- 
 
 Use other styles of logic (to be discussed).
 
- 
 
Inverter layout alternatives:
 
- 
 
NOTE: A true donut "inverter" is NOT optimal:
 
- 
 
All complementary gates may be designed using a single row of n-transistors above or below a single row of p-transistors, aligned at common gate connections.
- 
 
"Stacked layout" (on right): signals applied to multiple n- and p-transistors. 
- 
 
Works well for cascaded gates.
 
- 
 
"Line of diffusion" rule: 
- 
 
Transistors form a line of diffusion intersected by poly.
 
 
- 
 
Diffusion will be unbroken if identically labeled Euler paths can be found for the p and n trees:
 
- 
 
CMOS Standard Cell Design: 
- 
 
 The cells are characterized by some geometric regularity such as a fixed cell height.
- 
 
 A library of common gates such as NAND, NOR, XOR, INV, etc. that can be used by automatic place and route tools. 
 
- 
 
Programmable Logic Array (PLA):
- 
 
 Used when implementing complex control logic in CMOS.
- 
 
 Start with a canonical format called 
two-level
 
sum-of-products
 representation.
 
- 
 
PLA: Converting to NOR-NOR format.
 
- 
 
PLA: NANDs may also be used. Which do you think is faster?
- 
 
Moreover, we want an area efficient implementation since these functions can be large.
- 
 
Instead of building the p-tree of the NOR, let's use pseudo-nMOS.
 
- 
 
PLA: Part of a 7 segment display.
 
- 
 
Gate Array:
- 
 
 Consists of a fixed image of under layers (typically well, diffusion and poly).
- 
 
 Wiring layers used to program the array (typically contact, metal1, via and metal2).
 
- 
 
Sea-of-Gates Layout: Generalization of the Gate Array.
- 
 
 Continuous rows of diffusion are run across the entire chip.
- 
 
 Poly is tied to V
DD
 and V
SS
 to isolate logic gates from each other.
 
- 
 
Both the control signal and its complement have to be routed.
- 
 
 It is important to equalize delays along these control lines.
 
- 
 
Consider the following two possible layouts.
 
- 
 
Signal arrival time scenarios.