# **REBEL and TDC: Two Embedded Test Structures for On-Chip Measurements of** Within-Die Path Delay Variations

Charles Lamech, James Aarestad, Jim Plusquellic, Reza Rad ECE Dept., Univ. of New Mexico

clamech@, jaarestad@, jimp@, reza@, ece.unm.edu

Abstract -- As feature printability becomes more challenging in advanced technology nodes, measuring and characterizing process variation effects on delay and power is becoming increasingly important. In this paper, we present two embedded test structures (ETS) for carrying out path delay measurement in actual product designs. Of the two structures proposed here, one is designed to be incorporated into a customer's scan structures, augmenting selected functional units with the ability to perform accurate path delay measurements. We refer to this ETS as REBEL (regional delay behavior). It is designed to leverage the existing scan chain as a means of reducing area overhead and performance impact. For cases in which very high resolution of delay measurements is required, a second standalone structure is proposed which we refer to as TDC for time-to-digital converter. Beyond characterizing process variations, these ETSs can also be used for design debug, detection of hardware Trojans and small delay defects and as physical unclonable functions.

### 1 Introduction

The integration of embedded test structures on product chips to measure and analyze delay variations in becoming increasingly important for tracking quality and for improving correlations between hardware and models. With worsening variability as feature sizes continue to shrink, designers increasingly rely on accurate variation models to mitigate yield loss. Since even marginal increases in yield result in a significant improvement in profitability and product quality [1], a broad range of work is ongoing to characterize within-die and die-todie process variations (PV), spurring the development of area-efficient structures and methods for validating variation models.

PVs are increasing in magnitude and are becoming increasingly sensitive to the design-context, which challenges conventional test structures to capture delay variations that occur in actual product macros. These trends are driven by, among other factors, increasing variability in process control, which worsens with scaling. In subwavelength processes (<193 nm), for example, photolithographic interference causes distortion in layout structures [2]. Reticle enhancement techniques (RET) help with printability issues, but are less effective for advanced technology nodes [1]. Both random and systematic within-die PVs are growing more severe with shrinking geometries and increasing die size [3-4].

PV challenges are most apparent in an across-field (within-die) context. The main sources of these are due to optical source limitations, and layout-based systematic effects (pitch, line-width variability, and microscopic etch loading [5-7]). Unfortunately, traditional die-to-die level testing and measurement methods, e.g., scribe-line structures, are ineffective for context-sensitive, within-die characterization. More recent efforts to embed test structures, such as distributing a set of ring oscillators across the layout, are capable of capturing within-die variations, but are becoming increasingly less accurate as predictors of delay variations in actual product macros. Truly embedded test structures, such as those that measure delay [8][9][10] and power [11] characteristics of the macro itself, offer the best solution, but are difficult to integrate without having an adverse impact on area overhead, yield loss, performance, I/O interface, test cost, etc. of the product design.

In this paper, we propose an embedded test structure (ETS), called REBEL, which is designed to measure regional path delays in macros while minimizing these types of adverse effects. We also \*Kanak Agarwal IBM Austin Research Laboratory kba@us.ibm.com

describe a second ETS, called TDC, that provides higher measurement resolution but is more invasive. Both are designed to serve applications such as model-to-hardware correlation, detection of hardware Trojans [12], design debug processes, detection of small delay defects [13], and physical unclonable functions. Each of these areas requires accurate measurements of path delays and/or the ability to differentiate at high resolutions between delays of neighboring paths.

The REBEL ETS leverages the scan chain architecture to measure delay variations, in particular, it uses a special configuration of *flush delay* mode that is available in LSSD-style scan chains<sup>1</sup>. We demonstrated in recent work the promise of capturing regional delay variations using a special launch-capture timing sequence applied while in flush delay mode [10]. We extend this technique here by allowing output signals from design macros to be inserted into the flush delay chain for path delay measurements. The TDC ETS architecture, on the other hand, consists of an inserted stand-alone structure that measures path delays using a special delay chain which incorporates current-starved inverters [14]. A pulse is generated by the arrival of transitions from two paths in a macro that 'shrinks', and eventually disappears, over the length of the chain. Although TDC area overhead is larger than REBEL and constrains the paths that can be timed, TDC provides high resolution measurements in the range of 10's of picoseconds. In contrast, REBEL is more area efficient but provides less resolution, e.g., 100's of ps.

SPICE simulation experiments are carried out on a RC-transistor netlist of an AES SBOX standard cell layout to determine the timing resolution achievable using REBEL and TDC. The balance of this paper is organized as follows: Section 2 discusses related work. Section 3 describes the details of the proposed ETS. In Section 4, we present the results of simulation experiments. We conclude in Section 5.

#### 2 Background

Test structure design continues to be an active area of research [15-16]. Embedded ring oscillators (EROs) have been successfully used to characterize within-die performance variations [17]. Due to their simple design and I/O interface, they have been the preferred embedded test structure. For example, the authors of [18] propose a reconfigurable Ring Oscillator circuit for measuring delay of individual gates and reported a within-die delay variation of up to 26% on a 65 nm process node and a measurement accuracy of 1 ps. Several recent timing structures have been proposed which are fully analog in nature [19], and in [20], a within-die variation characterization system is proposed which uses a on-chip sampling oscillo-scope.

Numerous time-to-digital converter (TDC) designs have been previously proposed, simulated, and in some cases, realized in silicon [16, 21]. The authors of [19] have proposed a structure to measure on-chip delays using a variety of measurement circuits. However, these devices were intended to serve as the basis for

<sup>1.</sup> Application of REBEL to designs which use MUX scan is discussed.

adaptive timing mechanisms to provide on-chip synchronous timing control.

From the references and citations above, it is clear that the use of on-chip ETSs for path-delay timing is not new. Designs of differing type and complexity have been proposed to address a variety of applications. The main distinguishing characteristic of our proposed ETSs over those previously proposed is the degree to which they are embedded. Our primary focus is on integrating the measurement structure into the product macros themselves, and on doing so in a minimally invasive manner, i.e., we leverage existing resources such as the scan chain and integrate such that the product timing paths are minimally impacted.

### **3 Embedded Test Structure Design**

Although both REBEL and TDC are designed to measure path delays, there are several significant differences that distinguish them in terms of their functionality and target applications:

- REBEL leverages the scan chain to capture a snap-shot of the voltage behavior of a signal over time. In other words, the data collected from REBEL is not invalidated if the signal-under-test transitions multiple times, i.e., if it glitches. The data from the TDC is invalidated in this case.
- REBEL timing resolution is coarser than that provided by TDC. In general, REBEL timing resolution is determined by the delay through each element of the scan chain. Although clock strobing can be used to improve this resolution to approx. 100 ps, TDC provides an even higher level of resolution by default and in single shot mode (no clock strobing required).
- REBEL overhead is very small because it leverages the existing scan chain components. TDC, although small in size, is an standalone structure that must be inserted into the design, possibly requiring adjustments to the placement of the product macros.

### 3.1 REBEL: Regional Delay Measurement Circuit

In this section, we describe the overall integration strategy and operation of the REBEL ETS. As indicated previously, REBEL is integrated into the scan chain directly, as shown in Fig. 1 for a clocked-LSSD-style scan architecture. Here, the regions labeled 'Product Macro' are functional units composed of combinational logic. Three scan chain segments are shown that serve to deliver input and capture output from these macros. The three blocks labeled Row Control Logic identify components of the REBEL ETS, and are described below. Beyond these three 'header' blocks, smaller blocks are also needed for local scan signal control for each of the scan FFs.

The basic idea is to generate a transition on the inputs to the macro using a standard *launch-off-capture* transition fault test. In this scenario, the scan chain is loaded with the initial pattern of the 2-pattern test and the system clk (CLK) is used to generate a transition in the core logic by capturing the output of a previous block, or by capturing the PI values, as shown in the figure. One or more transitions are propagated through the macro, as shown by the dotted line labeled **PUT** for 'path-under-test'. The PUT's transition emerges on an output of the macro, and drives the D input of a scan FF in the second row. Special control logic associated with the scan FF (to be described) allows the transition to propagate along the scan chain, as shown by the dotted line. CLK is then de-asserted to halt the propagation behavior along the scan chain, including any glitching that may have occurred. This digital snap-shot is then scanned out for analysis.

For designs that make use of LSSD-style scan, propagation along the scan chain is easy to implement. This is true because LSSD supports a flush-delay (FD) mode of operation. In FD mode, both the scan A clock (SCA) and scan B clock (SCB) are held high, effectively making both latches of the FF transparent, i.e., any transition generated on D propagates to Q after a  $\Delta t$  that represents the delay through the FF. FD mode effectively makes the scan chain a combinational



Fig. 1. REBEL Integration Strategy

inverter chain.

However, the configuration in Fig. 1 differs from the traditional definition of FD mode because only a portion of the scan chain is configured in FD mode. In particular, the scan FFs along the top row and those along the middle row to the point of insertion of the PUT operate in functional mode, and only those to the right (and below) of this point operate in FD mode. In order to realize this configuration, several changes are required to the logic implementing the scan operation.

One of the components to support this dual mode of operation is labeled Row Control Logic (RCL) on the left side of Fig. 1. These blocks, in combination with a scan chain encoding scheme and localized scan FF logic, enable the dual mode of operation and provide a mechanism to specify a PUT's output to direct into the scan chain. This is accomplished by configuring several state bits in the RCL, and by loading a specific pattern into the scan chain before the launch-capture (LC) timing sequence (REBEL test) is applied, as described below.

Each RCL block controls a 'row' of scan FFs, called row-FFs, in the following description. Fig. 2 shows a schematic diagram of the RCL. The top portion of the diagram controls local (row-specific) scan clock signals, labeled SCA\_L and SCB\_L (\_L for local) while the bottom portion contains two shift registers (Shift Reg) and mode select logic. A large portion of the RCL logic is dedicated to allow the row-FFs to operate in either of the traditional functional or scan modes of operation. The global (chip-wide) scan signals labeled 'global SCA' and 'global SCB' are used to specify one of the three possible global operational states. When both are low, functional mode is in effect with CLK controlling the launch-capture activity in the row-FFs. Non-overlapping assertion of these signals causes all scan FFs to act as a shift register, implementing scan mode. The timing mode used by REBEL is specified when both of these signals are asserted. This is illustrated by the '1's on global SCA and SCB in Fig. 2.

Note that the two shift registers in the RCL block are conditionally inserted into the scan chain during a scan operation and can therefore be configured prior to a REBEL test. The shift registers' scan clock inputs (SCA/SCB) are also gated to prevent them from entering FD mode, thereby destroying the state information, when both global SCA and SCB signals are asserted. The state of the two shift registers defines the mode of operation for the row when REBEL mode is activated. Two control bits (as opposed to one) are



Fig. 2. REBEL Row Control Logic.

needed to implement the simultaneous functional and FD modes discussed above because there are actually four possible conditions that need to be handled. The three rows of scan FFs in Fig. 1 illustrate three of the four conditions. For example, the scan FFs in the top row need to be in functional mode throughout the REBEL test. In contrast, the scan FFs in the bottom row need to be in FD mode to extend the propagation path of the PUT signal captured in the middle row. Last, the middle row contains scan FFs in both of these modes, i.e., the scan FFs to the left of the PUT insertion point are in function mode while those to the right are in FD mode. The fourth condition is just a special case of this third condition where the insertion point is the left-most scan FF in the row. Table 1 identifies the bit configurations that handle these four conditions.

| Shift Reg | Functionality                                               |  |  |
|-----------|-------------------------------------------------------------|--|--|
| 00        | All scan FFs in row are in functional mode                  |  |  |
| 01        | All scan FFs in row are in FD mode                          |  |  |
| 11/10     | Left scan FFs in functional mode, right scan FFs in FD mode |  |  |

Table 1: Configuration states for Row Control Logic

Before describing the annotations in Fig. 2, we turn to the configuration of the scan FFs. Fig. 3(a) shows a clocked LSSD FF (CLSSD), which consists of three latches. The two latches on the left implement the functional path, and are controlled by the system Clk. The center latch is dual port and serves both as the slave for the functional path and as the master in the LSSD pair. The right-most latch is the slave latch of the LSSD pair. The top pass-gate of the dual port latch is highlighted to indicate that it has been modified. In the following paragraphs, it will become apparent that during the REBEL test, both CLK and SCA will be asserted simultaneously during a portion of the test. This creates a potential shorting condition in the dual port latch, i.e., both the master of the functional path and the SI input paths are enabled. To prevent this from happening, we modified the single input pass gate connected to the master's output to include a second input. The second input prevents the master's output from driving the dual port latch when both CLK and SCA are asserted simultaneously.

Fig. 3(b) shows the additional logic required to integrate REBEL into a design with CLSSD-style scan. The functional path's D-input is fanned out to a 2-to-1 MUX. This will allow for the insertion of a macro's PUT into the scan chain during the REBEL test. The local scan signals (SCA\_L and SCB\_L) are gated by *mode select logic* shown along the bottom of the figure. The mode select logic incorpo-



Fig. 3. (a) Modified clocked-LSSD scan FF and (b) additional 'front-end' logic.

rates the normal scan path (SOPrev to the SI input), as well as a propagating mode bit (ModePrev to ModeNext). The mode select logic is responsible for selecting the insertion point. This is accomplished by pre-loading the row-FFs with a pattern of all '1's followed by a '0' from left to right along the row-FFs. The '0' in this sequence causes the next scan FF to be configured in a special way, i.e., it allows the PUF output signal to drive the SI pin. The annotation and dotted line in the figure illustrates this case, and assumes the scan FF on the left (not shown) is configured with a '0' bit. Given the scan chain connects the SO output of each scan FF to the SOPrev of the next scan FF, this arrangement allows the scan chain encoding to specify the PUT insertion point. Moreover, the split mode of operation required for this row is implemented using a propagating mode bit (ModePrev and ModeNext), which is '1' for all scan FFs to the left of the insertion point and '0' to the right. The left-most scan FFs in the middle row of Fig. 1 are annotated with a bit configuration that enables the insertion of the PUT at the position shown.

The mode select logic also participates in controlling the local scan signals (SCA\_L and SCB\_L), and completes the implementing of the four conditions described above in reference to the RCL. The shift registers in Fig. 2 are annotated with four states (for the four conditions). The '00' state, which forces functional mode for the row-FFs (row 1 in Fig. 1), sets both SCA\_L and SCB\_L to '1'. Given these signal connect to the inputs of the two NOR gates in instances of the scan FFs (as shown in Fig. 3(b)), and '1' is the dominate value for a NOR gate, this condition effectively disables FD mode for the entire row. In this case, the ModeNext and SONext output signals of the RCL, which connect to the left-most scan FF's ModePrev and SOPrev signals, are irrelevant.

The '01' state, as discussed earlier, forces the row-FFs into FD mode (row 3 in Fig. 1). This requires both of the SCA\_L and SCB\_L signals to be set to '0'. However, the annotation in Fig. 2 indicates the value of SCA\_L is ' $\overline{Q}$ ', which is the inverted output value of the negative edge triggered FF (N-FF) in the RCL. In the implementation flow for a REBEL test, the initial value of the N-FF is set to '1' by virtue of strobing the SET\_B signal low prior to the REBEL test. The REBEL test is defined as a rising edge on CLK (which effectively launches a transition(s) into the macro-undertest), followed by a falling edge on CLK that acts to capture a snapshot of the PUT's behavior in the scan chain. The snapshot is real-



Fig. 4. REBEL support logic for MUX scan.

ized by de-asserting the Q output of the N-FF, which occurs when CLK goes low<sup>1</sup>. This in turn causes the SCA\_L output signal from the RCL to transition from '0' (initial value) to '1'. From Fig. 3(b), the arrival of the '1' on SCA\_L signals of the scan FFs de-asserts the SCA input and turns off FD mode. This action captures the snapshot of the PUT's voltage behavior in the scan chain.

The ModeNext output signal of the RCL configured in the '01' state is '0'. The '0' propagates along the mode select logic of the row and forces all row-FFs to operate in scan mode, i.e., SO to SI to SO, and so on. This condition allows for the propagation of the PUT's signal along the scan chain. The SONext signal's value for state '01' is given as 'SI' to indicate that this signal is driven from the SI input of the RCL. Therefore, the scan chain by-passes (and preserves the contents of) the state elements in the RCL. The SI input in turn connects to the SO signal from the right-most scan FF of the previous row, effectively extending the scan path across rows (see Fig. 1).

Finally, the '11' state in the RCL configures a split mode of operation in the row-FFs and connects a specific PUT output into the scan chain (row 2 in Fig. 1). The mode select logic in the scan FFs work together with the RCL block to implement this split mode of operation. The behavior of the SCA\_L and SCB\_L outputs are identical to those described above for state '01'. The difference lies in the state of the ModeNext and SONext output signals in Fig. 2. As noted above, a string of '1's followed by a '0' are pre-loaded into the scan chain to specify the PUT insertion point. The '1' on the ModeNext output propagates along the mode select logic, described earlier in reference to Fig. 3(b) until a '0' is encountered in scan FFs of the row. This causes the next scan FF to be configured as the insertion point. The remaining scan FFs in the row are configured in FD mode because the mode bit is inverted to a '0' after the insertion point. RCL state '10' behaves identically but allows the insertion point to be the left-most scan FF in the row.

We have designed the REBEL support logic such that it minimizes the impact on the functional behavior of the design. There are two components of REBEL that impact the functional operation. The first is the change of the CLSSD as shown in Fig. 3(a), and the second is the fanout of the D input to the 2-to-1 MUX as shown in Fig. 3(b). Each of these changes adds a small  $\Delta t$  to the functional path.

### 3.1.1 MUX Scan Implementation

Although we integrate and demonstrate REBEL in a CLSSDstyle scan chain (which is the style used in the design flow of our chip), MUX scan is the industry standard. Integration of REBEL into MUX scan is easy and even less invasive than it is for CLSSD.

The overall operation of REBEL for MUX scan is very similar to that described for CLSSD. The main difference is that the launch and capture is accomplished using rising edges of CLK (as opposed to a rising and falling edge for CLSSD). Also, an additional primary input

1. The use of CLK is key to improving the timing accuracy of the REBEL test.

is required to specify FD mode. This global signal is routed to the RCL blocks (not shown). A RCL block for MUX scan is similar in function to the CLSSD version except that all logic in reference to SCA\_L and SCB\_L of Fig. 2 can be eliminated.

The key objective in the MUX scan implementation is to implement a FD mode, i.e., a combinational path, using the latches within the MUX scan FFs. This can be achieved by adding a 'tappoint' to the master latch, called QMNext in Fig. 4, and routing this signal to a 2-to-1 MUX in the next scan FF of the scan path (labeled QMPrev in Fig. 4). The SE input in Fig. 4 refers to the globally routed scan enable signal (already required for MUX scan). SE is set to '1' when we are in scan mode, and '0' when in functional or FD (REBEL test) mode. The remaining logic gates are inserted to implement the four conditions described earlier.

For example, to configure a row in functional mode (row 1 in Fig. 1), the RCL block places a '0' on the FD\_L wire. To configure a row in FD mode (row 3 in Fig. 1), the RCL block sets FD\_L to '1' and ModePrev to '0'. For a split mode row (row 2 in Fig. 1), the same scan FF encoding method described for CLSSD is used. In addition, the RCL block forces a '1' onto FD\_L and sets ModePrev to a '1' for insertion points other than the left-most scan FF in the row or '0' otherwise. The annotation in Fig. 4 shows the values of the scan FF at the point of PUT insertion for a split mode row. The REBEL implementation using MUX scan is actually smaller in overhead and is less invasive to the functional path (only one capacitive load is added at the tap-point in the master latch) than it is for CLSSD.

### 3.1.2 PUT Delay Analysis Process

The Launch/Capture delay in REBEL is controlled by CLK, as described earlier, and therefore REBEL leverages the CLK tree for critical timing events.

- A REBEL test is carried out as follows:
- 1. Configuration data is scanned in.
- 2. The global SCA and SCB signals are asserted.
- 3. CLK is asserted to launch a transition into the PUT.
- 4. CLK is de-asserted after a specific Δt, sufficiently long to allow the transitions on the PUT to propagate along the scanchain.
- 5. The global SCA/SCB signals are de-asserted, and the values in the scan chain are scanned out.

The delay in the combinational path is computed using Eq. 1.

$$T_{path} = T_{lc} - T_{sc}$$
 Eq. 1.

where,  $T_{path} = Delay$  in the combinational path

 $T_{lc} = Launch/Capture Delay$ 

 $T_{sc}$  = Delay in the Scan Chain

The scan chain delay,  $T_{sc}$ , can be calculated from the number of scan cells that are set by the propagating edge(s), and the data obtained from a set of calibration tests (described in Section 4).

### 3.2 TDC: Time-To-Digital Converter Test Structure

A block diagram of the TDC is shown in Fig. 5. A set of outputs from the macro-under-test (not shown) are chosen and are fanned-out to connect to the PATH[0:7] inputs of the TDC. The TDC is designed to measure the delay difference between two of these path signals. The scan chain labeled with SI and SO along the top of the figure is used to configure the TDC in advance of a timing test. For example, the *path select* scan FFs (top left) are used to select two PATH signals. The selected paths drive SIG0 and SIG1, which are routed to the inputs of a pulse generator (PULSEGEN). The PULSEGEN produces a negative-going pulse, the width of which is equal to the time delay between the transitions on the two input signals. The addition of *polarity control* FFs (top right) permits time delay measurements on any mix of input signal polarities, i.e., rising and falling edges. The Glitch Detectors are used to deter-





mine if any glitches occurred during the test. As noted earlier, the TDC data is invalid if this occurs.

The output of the PULSEGEN circuit drives the input of the TDC Measurement Unit. The Measurement Unit contains a 120-element delay chain in our implementation. The odd and even elements of this chain are connected separately to analog voltage signals, CV1 and CV2, that govern the gate biases of n-channel, current-starved transistors, which are inserted in the transistor stacks of the inverters. By tuning the voltages on CV1 and CV2, the falling (leading) edge of the odd elements of the delay chain can be delayed, and /or the rising (trailing) edge of even elements can be delayed. For example, by lowering the voltage on CV2 with respect to CV1, the leading edge travels more slowly than the trailing edge, and the pulse shrinks as it propagates. Through proper tuning, the pulse can made to 'disappear' at some point along the delay chain.

Even inverters in the delay chain also drive the SET input of a NOR set/reset (SR) latch, and the odd inverters drive the SET input of a NAND SR latch. This allows a propagating pulse to 'record' a '1' in these SR latches as it propagates down the delay chain. Therefore, the point at which the pulse disappears can be determined by transferring the values from these SR latches into a scan chain, shown along the bottom of Fig. 5, and then scanning them out for analysis. The data from the scan chain after a timing test is applied appears as a thermometer code, i.e., a string of '0's followed by a string of '1's. A separate calibration process (described below) permits the conversion of a digital thermometer code to a  $\Delta t$  that represents the pulse width of the pulse generated at the output of the PULSEGEN.

### 4 Simulation Results

In order to validate our ETS structures, we carried out a set of simulation experiments on RC-transistor models of several layouts. Fig. 6(a) depicts the layout used in the REBEL simulations. CADENCE Encounter was used to synthesize the layout of an AES SBOX macro<sup>1</sup>, with the REBEL ETS embedded. The overhead of REBEL for this small macro is approx. 2 percent<sup>2</sup>. A full custom version of the TDC ETS layout is shown in Fig. 6(b). We carried out simulations on RC-transistor models extracted from this layout, without incorporating the AES SBOX macro. Instead, we used the waveforms generated from simulations of the layout in Fig. 6(a) as input to the TDC simulations.

In both cases, we created a set of five simulation models, one Nominal (TT) and four process corner models, identified as fast-NMOS, fast-PMOS (FF), slow-NMOS, slow-PMOS (SS), and two

- 1. SBOX is a component of the Advanced Encryption Standard (AES) engine.
- 2. We expect the overhead to be significantly smaller in larger designs.



Fig. 6. (a) REBEL layout (b) TDC layout.

mixed variations given as FS, SF. Also, a separate calibration process was carried out on each ETS to enable the translation of the digital scan data to actual  $\Delta t$ 's. The calibration processes also reduce the adverse effects of within-die and die-to-die variations that occur within the REBEL and TDC structures themselves.

## 4.1 Calibration

### 4.1.1 **REBEL** Calibration

The calibration process for REBEL is designed to enable the delay of a transition propagating along the scan chain to be eliminated from the timing measurement, as specified by  $T_{sc}$  in Eq. 1. Once eliminated, the delay of the PUT can be determined.

The calibration process involves placing the entire scan chain into flush delay mode and carrying out a sequence launch/capture experiments. In each successive experiment, the capture timing is increased by a small  $\Delta t$  and the digital values stored in the scan chain are analyzed. For some of the experiments, the edge is able to propagate to the next scan FF in the chain. The entire sequence of experiments yields results that indicate when this occurs for each scan FF in the chain at the level of timing resolution given by the  $\Delta t$ step size. We restricted the  $\Delta t$  step size to 100 ps in our experiments, although higher resolutions may be attainable in practice.

The graph shown in Fig. 7 gives the results from applying calibration tests in this fashion to each of the five process models. The step-wise nature of the curves reflects the delay through each of the scan FFs, which is approx. 550 ps. These curves can be used to derive PUT delays in the macro. Assume, for example, the PUT is inserted into the scan chain at the 12th scan FF, and the scan chain values indicate that the edge propagated to the 19th scan FF. To compute the path delay, the LC delays for the 12th and 19th scan FF are looked-up on the x-axis using the appropriate curve in Fig. 7, and then subtracted. The  $\Delta t$  computed is the delay through that portion of the scan chain. The LC delay to the 19th scan FF (T<sub>lc</sub>) and the scan chain delay (T<sub>sc</sub>) can be plugged into Eq. 1 to determine the delay of the PUT. Note that the resolution obtained for the PUT



Fig. 7. Calibration Data for AES SBOX macro in Different Process Corners.

delay is limited to the typical propagation delay through a scan FF (approx. 550 ps). However, as discussed later in Section 4.2, the strobing technique described for calibration can also be used in the REBEL tests to improve the timing resolution to 100 ps or less.

### 4.1.2 TDC Calibration

As an illustration of the TDC operation, the top portion of Fig. 8 shows the waveforms produced at the output of the PULSEGEN (leftmost) and by several negative pulses (right) from inverters in the TDC. The lower portion of the figure shows the *inverted* outputs of several corresponding NAND latches. In this scenario,  $CV_1$  is set to 1.2 V and  $CV_2$  is set to 0.75 V, which causes the pulse to shrink and eventually disappear at or near inverter 100 (top right in the figure). Although not shown, all SR latches up to the latch associated with inverter 100 behave in a similar fashion.

The calibration process is simplified if the TDC maintains a constant rate of pulse shrinkage along the entire chain, and is independent of the pulse width. To evaluate this metric for our TDC implementation, we performed simulation experiments in which we injected a series of input pulses ranging from 100 ps to 1.3 ns, in 100 ps intervals, in 13 different experiments. The delays associated with the leading edge of each pulse are plotted along the y-axis in Fig. 9 for each inverter given on the x-axis. The delays from all 13 experiments are superimposed and labeled in the figure. The 100 ps pulse did not propagate through any inverters in the TDC, while the 200 ps pulse propagated through five inverters, as shown on the bottom left in the figure<sup>1</sup>. Each additional 100 ps allowed the pulse to propagate through an additional 9-12 elements of the delay chain.

The linearity of the superimposed curves suggests that the propagating pulse shrinks at a constant rate and its propagation delay is independent of the pulse width. We recognize that within-die process variations (not modeled in these simulations) will reduce this linearity to some degree. On the other hand, the compact custom layout of the TDC as shown in Fig. 6 is expected to mitigate within-die variation effects. In either case, calibration can be used to deal with both within-die and die-to-die process variations. The difference lies only in the number of calibration steps required, as described below.

The calibration process for TDC involves applying a set of pulses of known width to the inputs of the TDC. The pulses can be generated using the tester or an on-chip pulse generator. The results shown in Fig. 9 suggest that only two calibration pulses are needed to determine the linear mapping between delay and the value of the thermometer code if within-die variations are small<sup>2</sup>.

Fig. 10 shows the calibration curves derived from the simulation

 With CV<sub>1</sub> and CV<sub>2</sub> set to 1.2 V and 1.0 V, resp., the minimum pulse width that can be measured is approx. 150 ps.



Fig. 8. Selected pulses from inverter chain from a 5 ns input pulse and control voltage CV<sub>2</sub> set to 0.75 V.



Fig. 9. Inverter leading edge delays in a sequence of pulse width experiments on TDC with CV<sub>2</sub> set to 1.0 V.



Fig. 10. Calibration curves that map between actual delay and the value of the thermometer code from the TDC.

experiments. The x-axis plots the width of the pulse applied to the inputs of the TDC against the corresponding value of the thermometer code. The five curves shown on the right of Fig. 10 are derived from the five process models and with  $CV_1$  and  $CV_2$  set to 1.2 V and 0.75 V. The curve shown on the left is derived from the Nominal (TT) model with  $CV_2$  set to 1.0 V to illustrate that the slope of the curve depends on the set-points of the control voltages. Only two calibration pulses are applied in each case, one with a pulse width of 500 ps and a second with a 5 ns pulse width. These curves are used to map thermometer codes to actual delays for the experi-

2. Conversely, a larger sequence will be needed to deal with non-linearities introduced if within-die variations are significant.



Fig. 11. Increasing Measurement Accuracy Through Clock Strobing.

ments described in the next section.

### 4.2 Simulation Results

### 4.2.1 REBEL Path Delay Analysis

To evaluate the effectiveness of REBEL for measuring path delays, we derived a set of test vectors that drive transitions through the AES SBOX macro and analyze the path delays along seven different paths.

As indicated earlier, the timing resolution of REBEL will be limited to the step size delay of the scan FF (approx. 550 ps) unless a strobing technique is applied in the REBEL tests as well. The timing diagram given in Fig. 11 illustrates the concept of strobing as a means of improving timing resolution. The top row of scan FFs depicts the entire scan chain which is tested and characterized during the calibration process. The bottom section shows the AES SBOX and a portion of the same scan chain that connects to its outputs.

The vertical dotted lines on the right side of the diagram illustrate a series of high resolution (100 ps) strobing events, with those applied for calibration shown along the top and those for a REBEL test along the bottom. The goal during strobing is to determine at which 100 ps interval does the next scan FF get set with the propagating transition. This process effectively divides the entire  $\Delta t$  through each scan FF into smaller pieces. The LC interval determined in this way, i.e., that just sets the next scan FF, is used in Eq. 1 to compute the PUT's delay.

In Fig. 11, assume a REBEL test is measuring a rising transition on a path through the SBOX macro, and the initial interval is given by  $T_{lc}$ . The scan chain results obtained for this test are shown as a sequence of six '1's and one '0'. A sequence of strobes 100 ps apart is now applied and during the last application, the results change to a sequence of seven '1'. This new LC interval, labeled  $T_{lc_new}$ , is the target value used in the PUT delay calculation.

We simulated the application of seven transition tests applied to the AES SBOX. Table 2 identifies the test number in the first column, whether the signal emerging from the SBOX output is a rising or falling edge in the second column, the scan chain values in the third column, the final LC time interval in the fourth column (after strobing), the computed delay along the scan path in the fifth column and the computed delay of the PUT in the last column. Note that the first eight bits of scan chain are always '0' because these scan FFs are used to launch the transition into the SBOX. Fig. 12 shows the percentage of

| Test | Edge | Thermometer code        | T <sub>lc_new</sub><br>(ns) | T <sub>sc</sub><br>(ns) | T <sub>path</sub><br>(ns) |
|------|------|-------------------------|-----------------------------|-------------------------|---------------------------|
| #1   | R    | 00000001101111111100000 | 5.6                         | 3.5                     | 2.1                       |

 Table 2: Delay measured in different combinational paths in AES

 SBOX macro.



Fig. 12. Percentage error in delays using REBEL compared with the actual delays in AES SBOX macro.

| Test | Edge | Thermometer code         | T <sub>lc_new</sub><br>(ns) | T <sub>sc</sub><br>(ns) | T <sub>path</sub><br>(ns) |
|------|------|--------------------------|-----------------------------|-------------------------|---------------------------|
| #2   | R    | 000000001110111111100000 | 5.5                         | 2.9                     | 2.6                       |
| #3   | F    | 00000000000000001111111  | 5.5                         | 3.5                     | 2.0                       |
| #4   | F    | 0000000111100000001111   | 5.4                         | 2.9                     | 2.5                       |
| #5   | R    | 00000001011111110000000  | 5.1                         | 2.9                     | 2.2                       |
| #6   | F    | 00000000000000011111111  | 5.5                         | 3.1                     | 2.4                       |
| #7   | F    | 0000000111110000000111   | 5.1                         | 3.0                     | 2.1                       |

Table 2: Delay measured in different combinational paths in AES SBOX macro.

error (computed using Eq. 2) in the delays derived using REBEL, in comparison to the actual delays. The error ranges from -4% to 13%.

$$(T_{computed} - T_{actual})/T_{actual} \times 100$$
 Eq.2.

### 4.2.2 REBEL Defect Analysis

REBEL is also applicable to the problem of detecting small delay defects. To show this, we emulate delay defects by adding various amounts of capacitance to selected paths in the SBOX macro. Transition tests are derived that propagate transitions along the defective path, referred to as P1, and along a second defect-free path, called P2. REBEL is used to measure these path delays, and the results are used as input to a detection process that compares the relative magnitudes of the two path delays.

In addition to the delay defect simulations, a set of defect free simulations are carried out on each of the five process models to determine the expected behavior of P1 versus P2. For the delay defect simulations, the additional capacitive loads introduced on path P1 are 10 fF, 25 fF and 50 fF. The path delays of P1 (x-axis) and P2 (y-axis) are plotted in Fig. 13. Regression-based prediction limits are derived using the defect-free data. Data points outside these prediction limits are considered positive detections. The delay defect data points in Fig. 13 corresponding to the 10 fF capacitive loads are not detected in any of the process models. However, half of the 25 fF and all of the 50 fF capacitive loads are detected.

### 4.3 TDC Path Delay Analysis

As indicated earlier, the waveforms produced on the outputs of the SBOX macro are used as inputs to TDC simulations as a means of evaluating the accuracy of the TDC. In the TDC experiments, we simulated an additional five transition tests, for a total of ten, on each of the five process models of the SBOX macro.  $CV_2$  was set to 0.75 V for these simulation experiments to expand the range of measurable delays that occur in SBOX. The calibration curves shown in Fig. 10 are used to derive estimates of the path delays.



Fig. 13. Analysis of emulated delay defects in the SBOX macro.

The results of the analysis are shown in Fig. 14, with the logic test number on the x-axis and process model on the y-axis. The z-axis plots the percentage error between the computed and actual values, as given by Eq. 2. The delays along the tested paths in the SBOX varied over the range of 3-4 ns. The errors range from approx. -2% to 5% in the bar graph, which illustrates the high resolution capability of the TDC ETS.

### **5** Conclusions

In this paper, we describe two embedded test structures for measuring on-chip path delays. REBEL has a small-footprint with timing resolution in the range of 500 ps, while TDC has a larger footprint but improves timing resolution to approx. 10 ps. We SPICE simulate models of REBEL and TDC that are extracted from 90-nm layouts. The timing resolution of REBEL and TDC are analyzed using a set of AES SBOX models, also extracted from a layout across five process corners. We describe a technique called clock strobing that improves the timing resolution provided by REBEL from 550 ps to 100 ps. The area overhead imposed by REBEL is 2% in the small SBOX macro but is likely to be less than 0.1% for larger product macros. The area of the TDC is approx. 8,200 um<sup>2</sup>.

#### References

- [1]R. Raina, "What is DFM & DFY and Why Should I Care?" in Proc. *ITC*, 2006, pp. 1-9.
  [2]D. Burek, "True Design-for Manufacturability Critical to 65-nm
- [2]D. Burek, "True Design-for Manufacturability Critical to 65-nm Design Success," http://www.eetimes.com/showArticle.jhtml?articleID=202803596.
- [3] S. R. Nassif, "Modeling and Analysis of Manufacturing Variations," in Proc. *Conference on Custom Integrated Circuits*, 2001, pp. 223-228.
- [4] I. Ahsan et al., "RTA-Driven Intra-Die Variations in Stage Delay and Parametric Sensitivities for 65 nm Technology," in Proc. Symposium on VLSI Technology, 2006, pp. 170-171.
- [5]D.G. Chesebro et al., "Overview of Gate Linewidth Control in the Manufacture of CMOS Logic Chips", *IBM J. of Res. and Dev.*, Vol. 39, Jul. 1995, pp. 189-200.
- [6]J.-Y. Lai, N. Saka, J.-H. Chun, "Evolution of Copper-Oxide Damascene Structures in Chemical Mechanical Polishing", J. of Electrochem. Soc., 2002, pp. G31-G40.
- [7]C. Hedlund, H. Blom, S. Berg, "Microloading Effect in Reactive Ion Etching", J. of Vacuum Science and Tech., Vol. 12, 1994, pp. 1962-65.
- [8]S. Paul, S. Krishnamurthy, H. Mahmoodi, S. Bhunia, "Low-overhead Design Technique for Calibration of Maximum Frequency at Multiple Operating Points," in Proc. *ICCAD*, 2007, pp. 401-404.
- [9]W. Xiaoxiao, M. Tehranipoor, R. Datta, "Path-RO: A Novel On-Chip Critical Path Delay Measurement under Process Variations," in Proc. *ICCAD*, 2008, pp. 640-646.
- [10]J. Aarestad, C. Lamech, J. Plusquellic, D. Acharyya and K. Agarwal, "Characterizing Within-Die and Die-to-Die Delay Variations Introduced by Process Variations and SOI History Effect", in Proc. DAC, 2011.
- [11]D. Acharyya, K. Agarwal, J. Plusquellic, "Leveraging Existing Power Control Circuits and Power Delivery Architecture for Vari-



Fig. 14. Percentage error in delays using TDC compared with the actual delays in AES SBOX macro.

ability Measurement," in Proc. ITC, 2010, pp. 1-9.

- [12]L. Jie, J. Lach; "At-speed Delay Characterization for IC Authentication and Trojan Horse Detection", in Proc. HOST, 2008, pp.8-14.
- [13]Y. Haihua; A.D. Singh; "Experiments in Detecting Delay Faults using Multiple Higher Frequency Clocks and Results from Neighboring Die," in Proc. *ITC*, 2003, pp. 105-111.
- [14]J. Kalisz, "Review of Methods for Time Interval Measurements with Picosecond Resolution", *Metrologia*, 41 (2004) 17-32, 2003, pp. 17-32.
- [15]P. Dudek, S. Szczepanksi, J.V. Hatfield, "A High-Resolution CMOS Time-to-Digital Converter utilizing a Vernier Delay Line", *IEEE Trans. Solid-State Circuits*, 2000, pp. 240-247.
- [16]C.C. Chen, P. Chen, C.S. Hwang, W. Chang, "A Precise Cyclic CMOS Time-to-Digital Converter with Low Thermal Sensitivity", *IEEE Trans. Nucl. Sci.*, 2005, pp. 834-838.
- [17]M. Bhushan, A. Gattiker, M. Ketchen and K. Das, "Ring Oscillators for CMOS Process Tuning and Variability Control," *Trans. on Semiconductor Manufacturing*, Vol. 19, No. 1, 2006, pp. 10-17.
- [18]B.P. Das, B. Amrutur, H.S. Jamadagni, N.V. Arvind, V. Visvanathan, "Within-Die Gate Delay Variability Measurement using Re-configurable Ring Oscillator," in Proc. *Custom Integrated Circuits Conference*, 2008, pp.133-136.
- [19]D.J. Kinniment, O.V. Maevsky, A. Bystrov, G. Russell, A.V.Yakolev, "On-Chip Structures for Timing and Measurement", in Proc. ASYNC'02, 2002, pp. 190-197.
- [20]Z. Xin, K. Ishida, M. Takamiya, T. Sakurai, "An On-Chip Characterizing System for Within-Die Delay Variation Measurement of Individual Standard Cells in 65-nm CMOS," in Proc. DAC, 2011, pp. 109-110.
- [21]A. Mantyniemi, T. Rahkonen, J. Kostamovaara, "A CMOS Time-to-Digital Converter (TDC) Based On a Cyclic Time Domain Successive Approximation Interpolation Method," *Solid-State Circuits*, Vol. 44, No. 11, 2009, pp. 3067-3078.
- [22]H. Onodera, H. Terada, "Characterization of WID Delay Variability using RO-array Test Structures," in Proc. *International Conference on ASIC*, 2009, pp. 658-661.
- [23]N. Drego, A. Chandrakasan, D. Boning, "All-Digital Circuits for Measurement of Spatial Variation in Digital Circuits," *Solid-State Circuits*, Vol. 45, No. 3, March 2010, pp. 640-651.
- [24]B. Hargreaves, H. Hult, and S. Reda, "Within-Die Process Variations: How Accurately Can They Be Statistically Modeled?," in Proc. Asia and South Pacific Design Automation Conference, 2008, pp. 524-530.