

13

14

35

36

# Article Entropy Analysis of FPGA Interconnect and Switch Matrices for Physical Unclonable Functions

Jenilee Jao <sup>1,†</sup>\*, Ian Wilcox <sup>2,†</sup>, Jim Plusquellic <sup>1,†</sup>, Biliana Paskaleva <sup>2,†</sup>, and Pavel Bochev <sup>3,†</sup>

<sup>1</sup> ECE, University of New Mexico, 1 University of New Mexico, Albuquerque, 87131, NM, USA

- <sup>2</sup> Radiation Modeling & Analysis, Sandia National Laboratories, Eubank, New Mexico, 87123, NM, USA
- <sup>3</sup> Center for Computing Research, Sandia National Laboratories, Eubank, New Mexico, 87123, NM, USA
- \* Correspondence: jenjao@unm.edu; iwilcox@sandia.gov; jimp@ece.unm.edu; bspaska@sandia.gov; pbboche@sandia.gov

<sup>+</sup> These authors contributed equally to this work.

Abstract: Random variations in microelectronic circuit structures represent the source of entropy for physical unclonable functions (PUFs). In this paper, we investigate delay variations that occur 2 through the routing network and switch matrices of a field programmable gate array (FPGA). The 3 delay variations are isolated from other components of the programmable logic, e.g., Look-up tables 4 (LUTs), flip-flops (FFs), etc. using a feature of Xilinx FPGAs called dynamic partial reconfiguration 5 (DPR). A set of partial designs are created that fix the placement of a time-to-digital converter (TDC) and supporting infrastructure to enable the path delays through the target interconnect and switch 7 matrices to be extracted by subtracting out common-mode delay components. Delay variations 8 are analyzed in the different levels of routing resources available within FPGAs, i.e., local routing 9 and across chip routing. Data is collected from a set of Xilinx Zyng 7010 devices, and a statistical 10 analysis of within-die variations in delay through a set of the randomly-generated and hand-crafted 11 interconnects is presented. 12

Keywords: FPGA interconnect, delay variations, physical unclonable functions

# 1. Introduction

A physical unclonable function (PUF) is a hardware security primitive that is tasked 15 with generating random bitstrings and encryption keys. The security properties of a 16 PUF architecture are closely tied to the physical layer components that define its source 17 of entropy, i.e., the layout characteristics of the circuit structure from which random 18 variations are measured, digitized and processed into bitstrings. Although many different 19 types of integrated circuits can by used as the platform for a PUF, the FPGA is a popular 20 choice because it allows prototypes to be created and validated quickly while providing 21 layout-level control over the design of the PUF's circuit structures. Moreover, advanced 22 FPGA features such as dynamic partial reconfiguration (DPR) can be leveraged to impede 23 adversarial reverse engineering attacks, by making physical layer components of the PUF 24 architecture unavailable in operational systems. 25

The physical layout components of a FPGA device consist of look-up tables (LUTs), 26 flip-flops (FFs), switch matrices (SMs) and wires, plus sets of commonly used components 27 including block RAMs, digital-signal-processing (DSP) blocks and digital clock managers 28 (DCMs). The performance characteristics of these components are impacted by imper-29 fections in the device manufacturing process. Processing variations affect each device 30 differently, making, e.g., the propagation delay along the same routes in different chips dis-31 tinct. The random and unique nature of process variation effects represent the cornerstone 32 of PUF technology. This paper focuses on the analysis of variation in constituent elements 33 of the FPGA, namely, the SMs and wires. 34

The experimental evaluation carried out in this work is performed on device instances of the Xilinx Zynq system-on-chip (SoC) 7010 architecture, which consists of a processor

Citation: Jao, J.; Wilcox, I.; Plusquellic, J.; Paskaleva, B.; Bochev, P.; Entropy Analysis of FPGA Interconnect and Switch Matrices for Physical Unclonable Functions. *Cryptography* 2024, 1, 0. https://doi.org/

Received: Revised: Accepted: Published:

**Copyright:** © 2024 by the authors. Submitted to *Cryptography* for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). system (PS) and programmable logic (PL) region. The SMs in the PL component are 37 responsible for configuring routes and for implementing fan out connections of input wires 38 to multiple out-going wires. The implementation details of the SMs are not provided to 39 end users because they are considered proprietary. However, low-level routing tools allow 40 routes through SMs to be manipulated. In an initial set of experiments, called Hand-Crafted, 41 we use routing commands to re-route signals through SMs as a means of extending a set of 42 reference routes, called BaseRoutes, to include additional wires and SMs, called RouteExts. 43 A second larger *Tool-Crafted* design is created in which the Vivado place&route tool is used 44 to create the BaseRoutes and RouteExts, as an alternative to the hand-crafted routes of the 45 first design. 46

The delay of the RouteExts are extracted and isolated by subtracting out the BaseRoute delay. DPR is used as a means of eliminating artifacts introduced by MUXs (LUTs) in the delays of the RouteExts by fixing all LUT positions in the DPR bitstreams to the same locations. The RouteExts in the hand-crafted design are constructed to include different types of routing resources, including single, double, quad and long lines. Delay 51 measurements are made using an on-chip, high resolution timing engine, which provides a resolution of  $\sim 18$  picoseconds (ps). Multiple sample averaging is used to increase resolution even further.

Delay measurements are carried out on the RouteExts instantiated on a set of identically configured Zynq 7010 devices, and a statistical analysis of delay variation is presented. Given our goal is to measure the contribution of SMs and routing wires to entropy leveraged by a delay-based PUF, only room temperature experiments are carried out. The following contributions characterize the technique and results presented in this paper.

- A dynamic partial reconfiguration technique is applied to measure and isolate the delays associated with wires and SMs in the programmable logic of a set of FPGAs.
- Wire and SM configurations are constructed using different routing resource types to assess the impact of wire length on the level of entropy.
- A series of data post-processing operations are proposed as a means of extracting only within-die delay variations, which represent the most robust random source of variations for a PUF.
- An estimate of within-die variations is derived for a wire-SM combination, and an analysis of the bitstrings derived using only wire-SM delay variations is presented to determine their statistical properties.

The remainder of this paper presents related work in Section 2, while Section 3 describes the system architecture, tool flow and data post-processing algorithms for the Hand-Crafted and Tool-Crafted designs. Section 4 presents the results from the two experiments and Section 5 presents conclusions.

## 2. Background

In [1], the authors use dynamic reconfiguration to enable fine control over delays in experiments which use a time-to-digital converter (TDC) by manipulating route options through SMs. A fine resolution delay tuning method to improve linearity in TDCs is proposed in [2]. The authors introduce additional capacitive loads, as fan out branches, to nets passing through SMs.

A path delay timing method is proposed in [3] that constructs nearly identical path structures and uses differencing to obtain the delay of the changed segment. The goal of the work was to accurately measure the impact of extending paths using additional routing resources (similar to the work proposed here). However, dynamic partial reconfiguration was not used to create the path length extensions, resulting in additional artifacts introduced by changing pin locations in the static portion of the path. Moreover, the authors provide very little data on within-die and across-chip variations.

A RO-based differential delay characterization method is proposed in [4] for applica-87 tion to variation aware design (VAD) methodologies. Multiple ROs are constructed with 88 overlapping path segment components and a set of equations are solved to deduce the path 89

47

48

49

50

52

53

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

75

76

77

78

79

80

81

82

83

84

85

86

segment delays. A RO construction technique allowed statistical delay characterization of individual LUTs and direct, double and hex path segment delays.

Tuan et. al [5] investigate within-die variation in 65-nm FPGAs using a unique RO 92 structure composed of non-inverting buffers and self-timed reset latches. The measured 93 within-die variations are decomposed into random and systematic components. The 94 analysis and techniques can be used to improve performance of devices using a location-95 aware timing model. More recently, the authors of [6] use soft-macro sensors to characterize 96 within-die and die-to-die variation for creating device-signature variability maps. They too 97 decompose variability into random and systematic components, and expand the analysis 98 to include different FPGA resources and across temperature-voltage operating conditions. 99

The authors of [7] propose a finely tunable programmable delay line (PDL) mechanism with high precision and low overhead using a single LUT. A PDL-based symmetric switch method is applied to an arbiter-based PUF to correct delay discrepancies caused by FPGA routing asymmetries. By applying majority voting and categorization of challenges into reliability groups, they show that PUF response stability can be increased across adverse environmental conditions.

In [8], the authors utilize two distinct manual placement and routing approaches to enhance precision of FPGA-based TDCs. In the first approach, uniform routing paths and controlled delay elements are used while the second approach enhances the first by introducing a combination of long and short routing wires. Notably, the second approach achieved better dynamic range and resolution.

A comprehensive overview of how measurements can be conducted on FPGAs to characterize within-die delay variability is proposed in [9]. The authors propose precise measurement techniques to analyze both systematic and stochastic delay variability in FPGAs by employing an array of ring oscillators and critical path tests on various 90nm FPGA devices. This approach enabled them to quantify the variability and analyze its impact on future FPGA technologies.

A Programmable Ring Oscillator PUF (PRO PUF) is introduced in [10], which utilizes 117 dynamic partial reconfiguration to generate bitstrings by manipulating switch matrices 118 within its architecture. The architecture is divided into static and dynamic areas, with the 119 latter being modifiable during operation, specifically altering signal transmission paths in 120 the switch matrix without affecting other structural components. Bitstring generation and 121 the selection of different transmission paths through the switch matrix are controlled by 122 challenge, with each unique path corresponding to a specific external configuration file, 123 enabling the generation of a wide array of challenge response pairs (CRPs). 124

The technique proposed in this paper shares similarities with [10], particularly as 125 it relates to the application of DPR and the utilization of static and dynamic regions. 126 However, the approach proposed in [10] explores routing networks within the DPR region 127 but does not explicitly remove variations introduced by the LUT architecture. Additionally, 128 rather than using a challenge to specify each configuration, our approach applies only 129 two partial bitstreams, which fix the LUT positions in both designs. The delays measured 130 using the BaseRoute and RouteExt partial bitstreams enables, through differencing, only 131 variations introduced by the routing wires and switch matrices to contribute to the entropy 132 of generated bitstrings. 133

#### 3. System Architecture

A block diagram showing the system architecture implemented on the Xilinx Zynq 7010 device is given in Fig. 1. The PL component shown along the top consists of two regions; a static region (SR) on the left and a dynamic partial reconfiguration (DPR) region on the right. The SR region incorporates a set of state machines and a time-to-digital converter (TDC), as well as a register interface, labeled GPIO for general purpose input/output, to the processor system component of the SoC. The Controller and TDC are capable of measuring path delays at a resolution of  $\sim$  18 ps in single-shot mode, and up to a total path length of

90

91



**Figure 1.** Block diagram of the test architecture for the Hand-Crafted experiments showing the Programmable Logic and Processor System partitions on a Xilinx Zynq 7010 SoC. The static region is shown on the left, which includes the path delay timing engine. Two dynamic partial reconfiguration instances of the routing architecture are shown on the right.

 $\sim$  25 nanoseconds [11]. In our experiments, the paths are measured 16 times and averaged to reduce measurement noise in the single-shot measurements.

Two experimental designs are evaluated in this paper. In both designs, a set of 144 BaseRoutes and RouteExts are created. The implementation layouts of the BaseRoute and 145 RouteExts are identical everywhere except for the wires and SMs used for the route(s) 146 between the source and destination LUTs. The BaseRoutes and RouteExts in first design 147 are hand-crafted to allow a wide variety of routing resources to be utilized in the RouteExt 148 designs, e.g., single, double, quad and long lines. The BaseRoutes and RouteExts are 149 implemented in a set of 67 partial bitstreams, one-at-a-time. The second design utilizes 150 only one BaseRoute DPR region and one RouteExt DPR region, and includes a set of 8192 151 distinct paths, composed of series-connected RouteExts, that can be configured and tested 152 using an input challenge. The first design is referred to as Hand-Crafted, while the second 153 one is referred to as Tool-Crafted. 154

As an example, the right side of Fig. 1 shows the DPR regions for the BaseRoute (top) and RouteExt (bottom) from a Hand-Crafted experiment. P&R constraints are used in the Xilinx Vivado CAD tool flow to construct both implementations, which fix the positions of the wires, SMs and LUTs. The regions enclosed by the rectangles show routing components that are locked down and remain static in both DPR bitstreams. The route is modified only in the region circled on the right. The base route passes directly through the SM while the route extension extends the route to other wire and SM components.

The testing process first programs the DPR region with the BaseRoute DPR bitstream and measures the delay. The FPGA is then reprogrammed with the RouteExt DPR bitstream and the path is re-measured. The delay of the wires and SMs that define the route extension is obtained by subtracting the BaseRoute delay from the RouteExt delay, which removes all common-mode components of the path delay. In many cases, the RouteExt design from the previous experiment is used as the BaseRoute for the next design, extending the route further in successive experiments.

The second, tool-generated, design utilizes two stacked modules from the SiRF PUF called the reconvergent-fanout module (RFM) [12]. A block diagram of the RFM is shown in Fig. 2. The module consists of two rows of 4-to-1 MUXs separated by AND, OR and AND-OR (AO) gates (the experimental design inserts two more rows identical to those shown). The select inputs to the MUXs are controlled by the row-path-select (RPS) inputs on the left. The same gate configuration is repeated across four columns,  $col_0$  through  $col_3$ , with the logic gate outputs distributed across all four columns using rotate input 175



Figure 2. Block diagram of the reconvergent-fanout module (RFM).



**Figure 3.** Tool-Crafted Experiment: Implementation views of the static portion (left), and BaseRoute and RouteExt DPR regions (right).

and output,  $ri_x$  and  $ro_x$ , wires. Rising and falling transitions are introduced by a Launch <sup>176</sup> FF shown along the top of the figure, which fans-out to the logic gate inputs in each of <sup>177</sup> columns. A 16-to-1 MUX, shown along the bottom of the figure, is used to select a path to <sup>178</sup> be timed by the TDC. The design with two stacked RFM modules possesses 131,072 distinct <sup>179</sup> paths, of which 8192 are testable with rising and falling transitions using 132-bit challenges. <sup>180</sup>

Unlike the Hand-Crafted design, placement constraints are used only to fix the place-181 ment of the LUTs implementing the logic gates and MUXs, and the Vivado P&R tool is used 182 to create the routing structure. In order to force the P&R tool to create different routes in the 183 BaseRoute and RouteExt designs, a timing constraint is used during the implementation of 184 the BaseRoute which is removed during the implementation of the RouteExt. The 13 ns 185 timing constraint forces the P&R tool to construct a routing architecture that minimizes the 186 number of wires and series-inserted SMs between the fixed LUT inputs and outputs. The 187 LUT input/output nets in the RouteExt design, on the other hand, almost always utilize 188 a larger number of wires and SMs, resulting in longer delays. The Vivado implementa-189 tion views in Fig. 3 show the static design of the SiRF PUF timing engine and bitstring 190 generation algorithm on the right and BaseRoute and RouteExt DPR regions on the left. 191

#### 3.1. FPGA Tool Flow

A flowchart of the bitstream generation process is depicted in Fig. 4. The operations 193 carried out in the five step process are as follows. 194



**Figure 4.** Xilinx Vivado tool flow for generating full and partial bitstreams for Hand-Crafted and Tool-Crafted experiments.

- 1. Synthesize the timing engine and other components of the static design.
- 2. Routing constraints are used to fix SMs and wires for the Hand-Crafted experiment in 67 separate designs, while timing constraints are used to force different routes from the Vivado PNR tool in the Tool-Crafted experiment.
- 3. TCL commands are used to create the DPR region, which is represented as a pblock in Vivado. The locked static design is used to maintain the exact same layout in all DPR designs created.
- 4. PNR is run to join the static and DPR designs.
- 5. The full bitstream is used to program the device, followed by any sequence of partial bitstreams created by this tool flow. 204

# 3.2. Delay Post-Processing Algorithm

The data collected from the Hand-Crafted design is used to estimate the level of withindie variations (entropy) introduced by routing wires and SMs. The data post-processing algorithm is crafted to achieve this goal and is described in this section using timing data from a set of 34 Zynq 7010 FPGAs. The algorithm consists of four steps, and is illustrated in Fig. 5.

- 1. The programmable logic of the FPGAs is programmed with the full bitstream, fol-211 lowed by a sequence of partial bitstream programming operations. The timing engine 212 measures both rising and falling delays of paths implemented within each of the 213 partial bitstreams. The curves labeled 1) BaseRoute & RouteExt Raw Delay in Fig. 5 214 show the rising path delays for the base route (black) and route extensions (blue) 215 measured from the 34 FPGAs (falling delays are omitted). The acyonyms BR and RE 216 refer to BaseRoute and RouteExtensions, respectively. We use the term Raw to refer to 217 both sets. 218
- The RE and BR delays are calibrated to remove global process variations using a *Global Process and Environmental Variation* (GPEV) module. The GPEV module applies a pair of linear transformations given by Eqs. 1 through 4. The mean and standard <sup>219</sup> <sup>220</sup> <sup>221</sup> <sup>221</sup> <sup>221</sup> <sup>222</sup> <sup>222</sup> <sup>222</sup> <sup>223</sup> <sup>224</sup> <sup>225</sup> <sup>225</sup> <sup>225</sup> <sup>226</sup> <sup>226</sup> <sup>227</sup> <sup>227</sup> <sup>227</sup> <sup>227</sup> <sup>228</sup> <sup>229</sup> <sup>229</sup> <sup>229</sup> <sup>220</sup> <sup>220</sup> <sup>220</sup> <sup>220</sup> <sup>220</sup> <sup>220</sup> <sup>221</sup> <sup>220</sup> <sup>221</sup> <sup>220</sup> <sup>221</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>222</sup> <sup>222</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>221</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>221</sup> <sup>221</sup> <sup>221</sup> <sup>222</sup> <sup>221</sup> <sup>221</sup>

205

195

196

197

198

199

200

201



Figure 5. Data post-processing algorithm applied to data from the Hand-Crafted experiments.

deviation of the 134 Raw delays from each FPGA are computed and the Raw delays are 222 standardized using Eqs. 1 and 3. A second linear transformation using 0.0 and 44.1 for 223  $\mu_{ref}$ , and  $\sigma_{ref}$  (Eq. 4) is then applied to convert the standardized values back to a form 224 similar to the original data (44.1 is the mean  $\sigma$  across all FPGAs). The same  $\mu_{ref}$  and 225  $\sigma_{ref}$  are used for all devices in the second transformation, which effectively removes 226 global performance differences while preserving within-die variations. Although 227 difficult to observe, the variations in the rising delays across all FPGAs in the plot 228 labeled 2) Compensate Raw Delays from Fig. 5 are smaller than those from Step 1. We 229 use the symbol 'F' for FPGA, 'i' for FPGA instance, 'c' for calibrated and 'r' for route 230 in these equations. The GPEV calibrated delays are referred to as  $BR_c$  and  $RE_c$ . 231

- 3. The rising and falling path delays from the BaseRoute design are subtracted from the corresponding rising and falling path delays of the RouteExt designs in the graph labeled 3) Subtract BaseRoute delay of Fig. 5. We refer to these delay differences as  $DVR_c$  and  $DVF_c$  (DV is an acronym for delay value). The delays of the  $DVR_c$  and  $DVF_c$  vary from 80 picoseconds (ps) to 1.8 nanoseconds (ns) across all 67 rise and fall delays.
- 4. The final transformation is shown in 4) Remove DC bias. The  $DVR_c$  and  $DVF_c$  posses 238 a DC bias that exists because the routes are not identically designed. The process 239 of removing bias is accomplished by computing the mean delay of each DVR<sub>c</sub> and 240  $DVF_c$  across all FPGAs and then subtracting this *offset* from the compensated raw 241 delays. We use the symbol 'R' here to refer to individual route extensions and 'x' for 242 the route extension number. Eq. 5 and 6 gives expressions for computing the rise and 243 fall compensated raw delays without bias, annotated as DVR/F<sub>co</sub>, with 'o' referring 244 to 'offset'. 245

$$u_{\mathbf{F}i} = \frac{\sum_{r=1}^{134} \text{Raw}_{i,r}}{134}$$
(1)

$$\sigma_{\mathbf{F}i} = \sqrt{\frac{\sum_{i=1}^{134} (\text{Raw}_{i,r} - \mu_{\mathbf{F}i})^2}{133}}$$
(2)

$$\mathbf{Z}_{\mathbf{F}i} = \frac{(\operatorname{Raw}_{i,r} - \mu_{\mathbf{F}i})}{\sigma_{\mathbf{F}i}}$$
(3)

$$\operatorname{Raw}_{\mathbf{c}_{\mathbf{i}}} = \mathbf{Z}_{\mathbf{F}i} * \sigma_{ref} + \mu_{ref} \tag{4}$$

$$u_{\mathbf{R}x} = \frac{\sum_{i=1}^{34} \text{Raw}_{c_{i,x}}}{34}$$
(5)

$$DVR/F_{\mathbf{co}_{\mathbf{i},x}} = Raw_{c_{\mathbf{i},x}} - \mu_{\mathbf{R}x}$$
(6)

#### 3.3. Tool-Crafted Data Post-Processing

The goal in this Tool-Crafted experiment is to evaluate entropy and uniqueness-related statistics of bitstrings generated using the delays of only wire and SB components in the FPGAs. The sequence of graphs in Fig. 6 show the data post-processing algorithm applied to the delays collected from one FPGA in this experiment. The data post-processing algorithm is modified with two additional steps over the process given for the Hand-Crafted experiments. Moreover, the addition of a differencing step to create DVD changes the step in which GPEV is applied.

ļ



Figure 6. Data post-processing algorithm applied to data from the Tool-Crafted experiments.

- The Raw DVR and DVF are plotted in the upper left graph, where we show the first 254 100 rising delays on the left and the first 100 falling delays on the right, both from 255 the larger sets of 4096 values in each group. The vertical shift in the two data sets, 256 with rising delays having smaller overall delays, illustrates a common process-related 257 characteristic that p-channel (pull-up) devices are not well correlated with n-channel (pull-down) devices on the same FPGA. This pattern varies depending on the FPGA. 259
- In contrast to the Hand-Crafted algorithm, the second step involves subtracting the BaseRoute delays from the RouteExt delays. The first 100 DVR and DVF are plotted in the 2) DVR and DVF graph.
- 3. In step 3, the 4096 DVR are randomly paired and subtracted from the 4096 DVF, as a means of doubling the level of entropy in the delay differences (DVD). Note that 264

additional DVD can be created by other random pairing and differencing operations 265 applied to the DVR and DVF groups, up to a total of  $(4096)^2$  unique combinations. 266

- 4. The GPEV operation is applied to the DVD to create  $DVD_c$ , using Eqs. 1 through 4 with 4096 replacing 134, DVD replacing Raw and 28.0 replacing 44.1 for  $\sigma_{ref}$ .
- 5. Step 5 converts the DVD<sub>c</sub> to DVD<sub>c</sub> by subtracting the mean  $DVD_c$  delay from each 269 of the individual  $DVD_c$ , using Eq. 5 and 6.
- 6. The operation carried out in Step 6 is optional, and serves only to make the number of 271 strong bits in the generated bitstrings approximately the same for each FPGA when a 272 threshold is applied (described below). The scaling operation computes the average 273 range of the variation in the DVD<sub>co</sub> of each FPGA *i* and multiplies all DVD<sub>co</sub> by a ratio 274 that makes the ranges approximately equal for all devices. The ratios vary between 275 1.00 and 1.91 and illustrate that the level of random variations (entropy) in each FPGA 276 is not constant. We refer to the delays shown in Step 6 as SDVD<sub>co</sub> ('S' for scaled) in 277 the following. 278



Figure 7. Illustration of the bit-flip avoidance bitstring generation algorithm.

#### 3.4. Bitstring Generation Algorithm

The SDVD<sub>co</sub> data shown in Step 6 of Fig. 6 is used as input to the bitstring generation 280 algorithm. As indicated earlier, the GPEV transformation applied in Step 4 calibrates 281 for chip-to-chip (global) process variations and delay variations introduced by adverse 282 environmental conditions. The transformations carried out in Steps 5 and 6 remove DC bias 283 and scale the remaining within-die variations of each device to make them similar across 284 chips. All of these transformations are designed to make it possible to apply a simple bit-flip 285 avoidance algorithm during bitstring generation that leverages the within-die random 286 variations that remain, and produces bitstrings nearly equivalent in size. 287

Bitstring generation is the final step (Step 7) of the proposed data post-processing 288 algorithm, and is illustrated in Fig. 7 using the first 15 SDVD<sub>co</sub> from the Tool-Crafted 289 experiment. Here, the data for all 34 devices is superimposed, with the delays for only 290 the first two device line-connected, and highlighted in red and blue, to illustrate the 201 randomized behavior of the points above and below zero. The black points are associated 292 with the remaining 32 devices. The y-axis is given in units returned by the TDC, where 293 each unit value is equal to 18 ps of delay. The range of  $\pm 6$  corresponds to  $\pm 108$  ps. 294

The two horizontal lines represent thresholds that are used to improve reliability, i.e., 295 points within the region between the threshold are not used during bitstring generation 296 [12]. Given the focus of this paper is on the analysis of entropy within the constituent 297

267

268

270

components of paths in FPGAs, we utilize room temperature only to analyze entropy and 298 evaluate uniqueness in the generated bitstrings. Moreover, the bitstrings are generated 200 using only those points above and below the thresholds, called *strong bits*, as a means of 300 emulating the actual bitstring generation algorithm. Points above the upper threshold are 301 assigned a bit value of 1, while those below the lower threshold are assigned 0. 302

## 4. Experimental Results

The experimental results for the Hand-Crafted and Tool-Crafted designs are presented 304 separately in the following sections. As indicated, the analysis for the Hand-Crafted 305 experiment is focused on determining the level of entropy that wires and SMs provide 306 for delay-based PUFs implemented on FPGAs. The entropy contribution introduced by 307 a third constituent element of FPGAs, namely, LUTs, as presented in [13], is discussed 308 for completeness. The analysis presented for the Tool-Crafted experiments is focused on 309 entropy and uniqueness statistical characteristics of bitstrings generated using wires and 310 SMs as the only source of entropy. 311

## 4.1. Experimental Results: Hand-Crafted Design

The vertical range of the delays plotted for each *Route* # in the 4) *Remove DC bias* graph 313 of Fig. 5 portrays within-die variations for each of the hand-crafted routes. In our analysis, we correlate the width of the vertical ranges, measured as  $3 * \sigma$  delay variations, with the 315 physical layout characteristics of the routes. In particular, the number unit-sized wires and 316 SMs are tabulated for each route as the metric proposed for the physical characteristics. 317 The number of unit-sized wires is the number of equivalent *single* wires that define the route. In particular, double wires count as 2 single wires, quad wires count as 4, while long 319 wires count as 6. 320

The number of unit-sized wires and SMs are plotted in Fig. 8 as a stacked bar graph 321 with the number of SMs shown along the bottom of each bar and the number of unit-322 sized wires shown along the top. Two degenerate cases occur for routes 14 and 66, where 323 the changes between the BaseRoute and RouteExt involve only 'bounces' within a single 324 switch box, i.e., the remaining components of the route are identical. For route 14, multiple 325 bounces in the RouteExt replace a single bounce in the BaseRoute, while for route 66, a 326 single bounce replaces a different single bounce in the same SM. Fig. 9 shows Vivado 327 implementation views for the BaseRoute and RouteExt SM within the route 66 designs. 328



Figure 8. Physical characteristics of the Hand-Crafted routes, showing the number of SMs in the lower portion of the bars and the number of unit-sized wires in the upper portion.

303

314





**Figure 9.** Vivado implementation view of the BaseRoute SM (left) and RouteExt SM (right) for Hand-Crafted route 66, showing the only change in the entire route is a change to the 'bounce' which occurs within the SM.

The scatter plot shown in Fig. 10 plots the proposed physical characterization metric of 329 the routes along the x-axis against the measured  $3 * \sigma$  delay variations along the y-axis. The 330 relationship between unit-size wires and SBs is factored in by adding a constant of 14.2 to 331 the values in the bar graph for the number of unit-sized wires. The red points correspond to 332 the rising delay variations while the blue points correspond to the falling delay variations. 333 A linear regression analysis is performed on each group of points separately in support of 334 determining the relationship between levels of entropy and physical characteristics of the 335 wires and SMs. A least-squares estimate (LSE) of the regression line is plotted through both 336 groups of points. 337



**Figure 10.** Correlation analysis of physical path characteristics against the level of entropy measured in the path composed of SMs and wires. The Pearson's correlation coefficient is 96% for the rise and 89% for the fall.

The LSE of the regression line is computed using a python function from the linear algebra library call *lstsq*. The numerical values from the bar graph in Fig. 8, namely the number of unit-size wires and SMs, are used as our model and serve as input to this function. The function returns two coefficients and a y-intercept, with the former two values representing the weighted contribution of the wires and SMs, respectively, to the total measured entropy of the route.

The coefficients generated for the rising delays are 2.06 and 27.75 for wires and SMs, respectively, while those for the falling delays are 2.17 and 12.35. Here, we see the main contribution to entropy is due to the SMs, and the contribution by SMs is more than double for rising delays than it is for falling delays. Moreover, the close matching of the magnitudes 345

351 352

357

for the wire coefficients support that fact that wire variation should be independent of a 348 rising or falling transition. Last, the fact that both regression lines are nearly superimposed 349 supports our modeling of the variation as two constituent components of the measured 350 variations.

The 3 \*  $\sigma$  value at x = 1 in Fig. 10 is approximately 17*ps* on average, and represents the delay variation introduced by a single wire-SM combination. For comparison, the 353 result presented in [13] indicates that the  $3 * \sigma$  (range) of delay variation associated with 354 the LUT in Zyng 7010 FPGAs is approximately 30 ps. Therefore, the variation introduced 355 by a wire-SM combination is somewhat smaller than the variation introduced by a LUT. 356

#### 4.2. Experimental Results: Tool-Crafted Design

The delays measured in the Tool-Crafted experiments is used to generate 128-bit 358 bitstrings for each of the 34 FPGAs. The bitstrings are subjected to several statistical 359 tests including inter-chip hamming distance (HD), NIST statistical tests, entropy and 360 min-entropy tests to evaluate their statistical quality. 361

Inter-chip hamming distance measures the uniqueness of the bitstrings by counting 362 the number of bits that are different in pairings of the bitstrings from different chips. The 363 ideal value is 50%, which indicates that half of the bits are different in each pairing. Eq. 7 364 is used to compute inter-chip hamming distance, with  $bs_i$  and  $bs_j$  representing the size of 365 the bitstrings from FPGAs i and j. The number of bits compared is given by k, which is a 366 subset of the bits in both bitstrings of the pair. Only strong bits corresponding to the same 367  $DVD_{co}$  within the two bitstrings of the pair are considered in the HD calculation, which 368 is given by k. The distribution of the inter-chip HDs is shown in Fig. 11. The distribution 369 varies from approximately 43% to 56% and possesses a mean value close to ideal at 50.04%. 370

InterChipHD<sub>*i*,*j*</sub> = 
$$\frac{\sum_{k=1}^{\min(|bs_i|,|bs_j|)} bs_{i,k} \oplus bs_{j,k}}{\min(|bs_i|,|bs_j|)}$$
(7)



Figure 11. Distribution of inter-chip hamming distances computed using all possible pairing of bitstrings from the 34 FPGAs.

The results of the NIST statistical tests applied to the bitstrings of length 128 bits is 371 shown in Fig. 12. Only six of the NIST tests are applicable given the limited size of the 372 bitstrings. All tests are passed with 34 FPGAs passing 5 of the tests and with 33 FPGAs 373 passing the Runs test. The entropy and min-entropy of the bitstrings is computed as 0.9957 374 and 0.9084, respectively. These results indicate the bitstrings are of cryptographic quality. 375



Figure 12. NIST statistical results for bitstrings of length 128 from the Tool-Crafted Experiment.

#### 5. Conclusions

An analysis of within-die variations (entropy) in the constituent elements of an FPGA, 377 namely, wires and switch matrices, is presented in this paper. Within-die variations of these 378 components is isolated by using a feature of FPGAs called dynamic partial reconfiguration 379 (DPR) and a set of constraints. The constraints are used to fix the locations of LUTs and 380 components of the timing engine. Partial bitstreams are created which vary the routing 381 characteristics of two versions of the design, one which instantiates a set of base routes 382 (called the BaseRoute design ) and a second which extends the base routes by adding wires 383 and additional switch matrices, called the RouteExt design. 384

The LUTs are fixed to the exact same positions in both designs, allowing components 385 of the path delays related only to the route extensions to be isolated through a delay 386 difference operation. We analyze the RouteExt delays from two sets of experiments, one 387 designed to allow variations in the constituent elements of the path to be analyzed, called 388 Hand-Crafted, and a second designed to allow an analysis of the statistical properties of the 389 bitstrings generated using only entropy contributed by wires and SMs, called Tool-Crafted. 390

The results show the within-die variations in delay associated with a SM is approxi-391 mately 17 ps, in contrast, the delay variations of a LUT, as reported in previous work, is 392 approximately 30 ps. This enables paths for PUF applications to be constructed with levels 393 of entropy that meet target goals. 394

The results show that the statistical characteristics of the bitstrings generated in the Tool-Crafted experiments are of high quality, achieving nearly 50% for inter-chip hamming distance (the ideal value) and passing all applicable NIST statistical tests.

Future work will investigate path construction techniques that optimize entropy by 398 creating a diverse netlist of SMs, wires and LUTs. PUF architectures constructed in this 399 fashion will be more robust to adverse environmental conditions and machine learning 400 attacks. 401

Author Contributions: All authors contributed to the study conception and design. Material prepa-402 ration, data collection and analysis were performed by Jenilee Jao and Jim Plusquellic. The first draft 403 of the manuscript was written by Jenilee Jao, Jim Plusquellic, Ian Wilcox, Biliana S Paskaleva and 404 Pavel B Bochev and all authors commented on previous versions of the manuscript. All authors read 405 and approved the final manuscript. 406

Funding: Supported in part by the Laboratory Directed Research and Development program at Sandia 407 National Laboratories, a multimission laboratory managed and operated by National Technology 408 and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, 409 Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. 411

Institutional Review Board Statement: Not applicable

Informed Consent Statement: Not applicable

376

410

395

396

397

412

414

418

419

432

433

Data Availability Statement: Dataset available on request from the authors

Acknowledgments: This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

**Conflicts of Interest:** The authors have no relevant financial or non-financial interests to disclose.

### References

- Bergeron, E.; Feeley, M.; Daigneault, M.A.; David, J.P. Using dynamic reconfiguration to implement high-resolution programmable delays on an FPGA. In Proceedings of the 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference, 2008, pp. 265–268. https://doi.org/10.1109/NEWCAS.2008.4606372.
- Berrima, S.; Blaquière, Y.; Savaria, Y. Fine resolution delay tuning method to improve the linearity of an unbalanced time-to-digital converter on a Xilinx FPGA. *IET Circuits, Devices & Systems* 2020, 14, 1243–1252, [https://ietresearch.onlinelibrary.wiley.com/doi/pdf/420.cds.2020.0026]. https://doi.org/10.1049/iet-cds.2020.0026. 425
- Ruffoni, M.; Bogliolo, A. Direct Measures of Path Delays on Commercial FPGA Chips. In Proceedings of the Proceedings: 6th IEEE Workshop on Signal Propagation on Interconnects, 2002, pp. 157–159. https://doi.org/10.1109/SPI.2002.258304.
- Yu, H.; Xu, Q.; Leong, P.H. Fine-grained characterization of process variation in FPGAs. In Proceedings of the 2010 International Conference on Field-Programmable Technology, 2010, pp. 138–145. https://doi.org/10.1109/FPT.2010.5681770.
- Tuan, T.; Lesea, A.; Kingsley, C.; Trimberger, S. Analysis of within-die process variation in 65nm FPGAs. In Proceedings of the 2011 12th International Symposium on Quality Electronic Design, 2011, pp. 1–5. https://doi.org/10.1109/ISQED.2011.5770808.
- 6. Taka, E.; Maragos, K.; Lentaris, G.; Soudris, D. Process Variability Analysis in Interconnect, Logic, and Arithmetic Blocks of 16-Nm FinFET FPGAs. *ACM Trans. Reconfigurable Technol. Syst.* **2021**, *14*. https://doi.org/10.1145/3458843.
- Majzoobi, M.; Kharaya, A.; Koushanfar, F.; Devadas, S. Automated Design, Implementation, and Evaluation of Arbiter-based PUF on FPGA using Programmable Delay Lines. *IACR Cryptol. ePrint Arch.* 2014, 2014, 639.
- Siecha, R.T.; Alemu, G.; Prinzie, J.; Leroux, P. 5.7 ps Resolution Time-to-Digital Converter Implementation Using Routing Path Delays. *Electronics* 2023, 12. https://doi.org/10.3390/electronics12163478.
- Sedcole, P.; K. Cheung, P.Y. Within-die delay variability in 90nm FPGAs and beyond. In Proceedings of the 2006 IEEE International Conference on Field Programmable Technology, 2006, pp. 97–104. https://doi.org/10.1109/FPT.2006.270300.
- Cui, Y.; Chen, Y.; Wang, C.; Gu, C.; O'Neill, M.; Liu, W. Programmable Ring Oscillator PUF Based on Switch Matrix. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020, pp. 1–4. https://doi.org/10.1109/ ISCAS45731.2020.9180552.
- Owen Jr., D.; Heeger, D.; Chan, C.; Che, W.; Saqib, F.; Areno, M.; Plusquellic, J. An Autonomous, Self-Authenticating, and Self-Contained Secure Boot Process for Field-Programmable Gate Arrays. *Cryptography* 2018, 2. https://doi.org/10.3390/ cryptography2030015.
- 12. Plusquellic, J. Shift Register, Reconvergent-Fanout (SiRF) PUF Implementation on an FPGA. *Cryptography* **2022**, *6*. https://doi.org/10.3390/cryptography6040059. 446
- 13. Jao, J.; Wilcox, I.; Thotakura, S.; Chan, C.; Plusquellic, J.; Paskaleva, B.S.; Bochev, P.B. An Analysis of FPGA LUT Bias and Entropy for Physical Unclonable Functions. *Journal of Hardware and Systems Security* **2023**. https://doi.org/10.1007/s41635-023-00137-z. 449

**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.