CMOS Logic Structures

CMOS Logic Structures

Full complementary static CMOS gates may be undesirable because:
The area overhead.
Their speed may be too slow.
The function may not be feasible as a full complementary structure (e.g. PLA).

Smaller faster gates can be implemented at the cost of:
Increased design time.
Increased operational complexity.
Decreased operational margin.

Full complementary gates can be designed as ratioless circuits:
A fixed ratio in size between pull-up and pull-down structures is not required for proper operation.

Unlike those we will consider now.

CMOS Logic Structures

Pseudo-nMOS logic

Gain ratio of n-driver transistors to p-transistor load (beta _driver /beta _load ), is important to ensure correct operation.
Accomplished by ratioing the n and p transistor sizes.

CMOS Logic Structures

Dynamic CMOS Logic

Pull-up time improved by virtue of the active switch (p-transistor can be much larger).
Pull-down time increased due to the ground switch.

CMOS Logic Structures

Dynamic CMOS Logic

What is wrong with cascading these structures?
(Hint: Consider the delay in the discharge of the left-most n-logic block at the start of the evaluate phase).

CMOS Logic Structures

CMOS Domino Logic:

These structures can be cascaded.

In a cascaded set of logic blocks, each stage evaluates and causes the next stage to evaluate (in the same way a line of dominos fall).

CMOS Logic Structures

Pass-Transistor Logic:

CMOS Logic Structures

Other forms of CMOS logic include:
BiCMOS Logic
Clocked CMOS Logic (C ² MOS).
NP Domino Logic (Zipper CMOS).
Cascade Voltage Switch Logic (CVSL).
Source Follower Pull-up Logic (SFPL).
(See Weste and Eshraghian for details.)

Where should one use what gate?
Complementary: Best option for most cases. Safe, fast, no DC power.
Pseudo-nMOS: Large fan-in NOR gates, i.e. PLAs, ROMs. DC power.
Transmission gate: Speed advantage, good for complex boolean functions.
CMOS domino logic: Low-power, high speed. Requires simulation!

Clocked Systems

Majority of VLSI systems are Finite State machines and Pipelined machines:

Clock Strategy

One of the most important decisions made at the start of a design is the selection of a clocking strategy.

It effects:
How many transistors are used per storage element.
How many clock signals need to be routed throughout the chip.

Topics:
Latch, Master-Slave Flip-flop and Edge-Triggered Flip-flop designs.
Setup and Hold time and clock race conditions.
CMOS Static and Dynamic Flip-flops.
Single phase clocking, clock skew/slew.
Two-phase clocking techniques.
Clock generation techniques.

Latches and Flip-flops

Latches and Flip-flops

The ambiguity of having a non-allowed mode caused by trigger pulses going active simultaneously can be avoided by adding two feedback lines:

Note if both J and K are high, and clock pulses, the output is complemented.

However, doing so enables the other input and the FF oscillates .

This places some stringent constraints on the clock pulse width (e.g. < than the propagation delay through the FF).

Synchronous circuit:

Changes in the output logic states of all FFs in a design are synchronized with the clock signal, phi.

Latches and Flip-flops

Note that the:
T FF ( toggle FF ) is a special case of the JK with J and K tied together.
D FF ( delay FF ) is a special case with J and K connected with complementary values of the D input.

Here the D FF generates a delayed version of the input signal synchronized with the clock.

These FFs are also called latches .

A FF is a latch if the gate is transparent while the clock is high (low).
Any changes in the input are reflected in the output after a nominal delay.

The transparent nature can cause race problems:

Master-Slave Flip-flops

Master-Slave Set/Clear Asynchronous FFs

Edge-triggered FFs

Problem with master-slave approach:

The circuit is sensitive to changes in the input signals as long as phi is high.
If the inputs do not remain constant when the clock is high, the master follows D , which, for example, consumes power.

The fix is to allow the state of the FF to change only at the rising (falling) edge of the clock.

Edge-triggered FFs

The modification applied to the JK FF is shown below.

Note that the inputs must be stable for some time before the clock goes low.

This is also true for the master-slave D FF, but the constraints are different.

Let's first define some terms.

Flip-flop Timing Definitions

Timing diagram showing the terms used to define the proper operation of a Flip-flop.

Tc: Clock Cycle Time.
T _s : The amount of time before the clock edge that the D input has to be stable.
T _h : Data has to be held for this period while the clock travels to the point of storage.
T _q : Clock-to-Q delay: Delay from the positive clock input to the new value of Q.

Setup/Hold Time Violations

Depending on the design, one or both of T _s and T _h may have to be non-zero.

For example, the master-slave D FF is likely to require a longer setup time than the edge-triggered D FF.

Edge triggered FF prevents the "master" from following the D input so the FF's internal delay does not affect setup time.

Setup/Hold Time Violations

The hold time interval starts with the beginning of the clock transition.

Clock skew and slew and other design details of the FF affect the hold time.

Toggle Flip-Flop with Asynchronous Clear:

System Timing

Two possible strategies to implement clocked systems:

Latches are a more economical implementation strategy but are transparent on half of the clock cycle, and cannot be used in feedback systems.
Also, the following constraint must be met for latches:

T _d < T _c /2 - T _q - T _s
where T _d is the worst case propagation delay, T _c is the clock cycle time, T _q is the Clock-to-Q time of latch A and T _s is the setup time for latch B.

Clock Race Conditions

Occurs when the data input to the register does not obey the setup and hold-time constraints.

Delays in the clock line to Reg B (hold-time violation).

New data stored instead of previous data:

Clock Race Conditions

Delays in the combinational logic that are larger than the clock cycle time (setup violation).

Data arrives late at Reg B, old data retained instead of latching new data.

As you can see, designers have to walk a temporal 'tight-rope', e.g., they have to minimize clock skew while considering worst and best case delays through combinational logic.

CMOS Static Flip-Flops

Full complementary version of the master-slave FF requires 38 transistors !

CMOS Dynamic Flip-Flops

Positive feedback is not the only means to implement a memory function.

A capacitor can act as a memory element as well.

In this case, a periodic refresh is required (in the millisecond range) due to leakage (hence the word dynamic ).

Consider the following "cheaper" (1/2 transmission gate) positive level-sensitive latch as a step toward deriving a dynamic FF:

CMOS Dynamic Flip-Flops

A master-slave FF is created by cascading two of these latches and reversing the clocks.

The problem with this latch is that phi ₁ and phi₁ might overlap, which may cause two types of failures:
Node A can become undefined as it is driven by both D and B when phi ₁ and phi₁ are both high.
D can propagate through both the master and slave if both phi ₁ and phi₁ are high simultaneously for a long enough period.

Single Phase Clock Skew/Slew

Clock skew causes conflicts and transparency.

Clock slew (slow rise and fall times) can also cause transparency:

Clock skew is a dominant problem in current high performance designs.

CMOS Dynamic Two-Phase Flip-Flops

Pseudostatic FF : The fix is to use two non-overlapping clocks phi ₁ and phi ₂ :

A large t _phi-12 allows proper operation even in the presence of clock skew.

Note that node A floats (dynamic) during the time period t _phi-12 but is driven during t _phi-1 and t _phi-2 .

Hence, the name pseudostatic .

CMOS Dynamic Two-Phase Flip-Flops

This version is simplier (6 trans) and is often used in pipelined datapaths for microprocessors and signal processors.

Disadv: 2 non-overlapping clocks required (4 if transmission gates are used).

These implementations MUST be simulated at all process corners (under worst-case conditions).

Two-Phase Clocking

Clock skew/slew:

CMOS Dynamic Two-Phase Flip-Flops

C ² MOS: A clever method which is insensitive to clock skew:

CMOS Dynamic Two-Phase Flip-Flops

C ² MOS is insensitive to overlap as long as the rise and fall times of the clk edges (clock slew) are sufficiently small:

C2MOS Flip-Flop

Races are just not possible since the overlaps activate either the pull-up or the pull-down networks but never both simultaneously.

The inverters force 0-1 and 1-0 propagation modes only.

However, if the rise and fall times of the clock are slow, there exists a time slot in which both n- and p-transistors are conducting simultaneously.

Correct operation requires the rise/fall times be smaller than about 5 times the propagation delay through the FF.

This is not hard to meet in practical designs, making C ² MOS especially attractive in high speed designs where avoiding clock overlap is hard.

Lots of other possible latch configurations, static and dynamic -- see Weste and Eshraghian.

Single Phase Local Clock Generation

Clock skew is minimized but area cost is severe.

Single Phase Global Clock Generation

Transistors in the inverter and pass gate should be similar in size.

Keep them small and use buffers to drive the load.

Note: The routing load MUST also be balanced on each of the clk lines.

Two-phase Global Clock Generation

Multi-Phase Clocking

Four-phase clocking strategies discussed in Weste and Eshraghian.

Modern designs tend to minimize the number of clock phases used due to problem of generating and distributing multiple clocks.

Single phase schemes used for complex, high-speed CMOS circuits.

Clock Distribution

Assume all the registers in a large CMOS design result in a capacitive load of 2000 pF. What is the peak current and average dynamic power?

Two techniques:
A single large buffer (cascaded inverters): Use when the module has a large number of diverse modules, i.e. a microprocessor.
A distributed-clock-tree: Use when design is highly structured and repetitive, i.e. a datapath.