# Process Variation Aware D-Flip-Flop Design using Regression Analysis

Shinichi Nishizawa† and Hidetoshi Onodera‡

†Graduate School of Science and Engineering, Saitama University, 255, Shimo-Ohkubo, Sakura-ku, Saitama, 338-8570 JAPAN

\$Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, JAPAN

E-Mail: nishizawa@mail.saitama-u.ac.jp, onodera@i.kyoto-u.ac.jp

Abstract—This paper describes a design methodology for process variation aware D-Flip-Flop (DFF) using regression analysis. We propose to use a regression analysis to model the worst-case delay characteristics of a DFF under process variation. We utilize the regression equations for transistor widths tuning of the DFF to improve its worst-case delay performance. Regression analysis can not only identify the performance-critical transistors inside the DFF, but also shows these impacts on DFF delay performance in quantitative form. Proposed design methodology is verified using Monte-Carlo simulation. The result shows the proposed method achieves to design a DFF which has similar or better delay characteristics in comparison with the DFF designed by an experienced cell designer.

### I. INTRODUCTION

Standard cells are key components for designing high quality VLSIs. Performance of standard cells directly affect the final quality of VLSIs. Large number of D Flip-Flops (DFFs) are used in a VLSI circuit as storage elements for digital circuits. Delay, energy performances and cell area of DFF circuit have a strong impact on a VLSI design. DFF circuit is carefully designed by an experienced cell designer and provided by a semiconductor foundry as a library cell targeting its nominal supply voltage.

Recently, with increasing demands to reduce the energy consumption of VLSI circuits, low voltage operation of a VLSI circuit attracts more attention than before. On-currents of PMOS transistors and NMOS transistors are strongly affected by its supply voltage. Optimal design of a DFF circuit may change depending on its supply voltage. However, there are not enough discussions on the design methodology for DFF circuit with a few exceptions [1]–[3]. Identification method of the performance-critical transistors inside a DFF is not clear to improve a DFF delay performance.

As technologies are scaled down to deep sub-micrometer region, within-die random variation becomes critical and significantly impacts on circuit performance [4], [5]. Lowering a supply voltage increases the effect of process variation on circuit performance. Thus, variation tolerant DFF design methodology is required. DFF is composed of a few dozen of transistors, and the performance of the transistors and sub-circuit blocks affect the performance of other transistors and other sub-circuit blocks. Several articles reported random variation strongly affects the performance of the latch circuits to capture and keep the input data synchronized with the clock signal. References [6], [7] discuss the failure condition of DFFs caused by process variation. In these papers, they propose a method for identifying performance-critical transistors by Monte-Carlo analysis and improve functional yield by enlarging the widths of the identified transistors. However, there are no discussion on how to obtain an optimal width for each

transistor under process variation. References [8], [9] evaluate the impact of process variation on its delay and energy characteristics in several DFF structures. However, these papers use only one constant width for all of transistors inside the DFF in the analysis. In actual DFF design, there are several variations in transistor widths and it is common to optimize those widths to achieve both fast operation and low energy consumption at the same time. It is not clear how to optimize the transistors width to achieve faster performance under process variation. Reference [10] proposes statistical framework for variation tolerant DFF design. This paper models the inverters and the transmission gates in a DFF circuit as a simple RC circuit, and the DFF delay is assumed as the sum of these logic gate delays for the worst case delay calculation without Monte-Carlo simulation. This model is simple, however this technique has several problems. It is difficult to assume all the transistors inside DFF affect to the whole DFF delay characteristics. In an actual DFF circuit, some performance-critical transistors are exists and these transistors determine the whole DFF delay characteristics. Also, this delay model exhibits large delay calculation error compared with transistor level simulation.

In this paper, we discuss a variation tolerant DFF design methodology for low voltage operation. We use regression analysis as a tool to obtain a delay model equation, and highlight the transistors which have a large impact on the DFF delay performance. The delay model is utilized to design a variation tolerant DFF, tuning the transistor widths inside the DFF, and verify its performance using Monte-Carlo simulation. We use a Transmission Gate DFF (TGFF) composed of minimum width transistors as an initial DFF circuit, and try to explore the best set of transistor widths for the TGFF to achieve the best delay performance at the  $3\sigma$  worst case condition.

The rest of this paper is organized as follows. Section II describes the regression analysis for DFF delay analysis. Section III describes the proposed DFF design methodology using regression formulation. Section IV describes experimental results of variation tolerant DFF design. Section V concludes this paper.

### II. REGRESSION ANALYSIS FOR DFF VARIATION ANALYSIS

In this section, we describe the use of regression analysis as a tool for a DFF design. We develop a delay model using regression analysis and highlight the transistors which have a strong impact on DFF delay characteristics.

Figure 1 briefly describes the proposed variation-aware DFF design flow using regression analysis. In this design flow, we try to find performance-critical transistors inside a DFF and enlarge these width to improve the DFF delay performance.



Fig. 1. Proposed design flow utilizing regression analysis.

Proposed flow is composed of three main parts, and iteration method is used to generate a final DFF design. Firstly, Monte-Carlo simulation is performed and obtain transistor performance variations and DFF delay performance variations under process variation. Secondly, regression analysis is performed to model a relationship between the transistor performances and the DFF delay performance considering these mean and standard deviation. Thirdly, the transistor widths are tuned to achieve faster delay performance utilizing the result of regression analysis. If the performance is not enough or there are more budget to enlarge transistors width, we can iterate this flow to obtain faster DFF performance. Finally, optimal DFF design after transistor width tuning is obtained.

# A. Brief explanation of regression analysis

In this section, we briefly explain the regression analysis for DFF variation evaluation. Regression analysis is a statistical modeling method for estimating the relationships among the variables. Regression analysis tries to formulate regression function which expresses a relationship between a dependent variable and one or more independent variables. Regression function also indicates the impact of independent variables on the dependent variable. Thus, regression analysis helps designer to select and evaluate the best set of variables to be used for developing a predictive delay model.

# B. Setup for regression analysis of DFF delay analysis

We build a regression function which expresses the relationship between DFF delay characteristics and transistor per-



Fig. 2. Schematic of TGFF.

formances. DFF delay characteristics is a dependent variable and transistor performances are independent variables in this regression analysis. In the regression analysis, some transistors have a strong relationship to the DFF delay characteristics, but other transistors may have a weak relationship to the DFF delay characteristics. We check probability of the coefficient of each independent variable in the regression function, and eliminate some of the variables which p-values (probability value) are less than significant level. The final result of the regression function is achieved when all of the p-values of reach to a significant level.

There are several important delay characteristics for DFF circuit, such as setup time (also called Clock-to-Data contamination delay), Clock-to-Q (C2Q) propagation delay and hold time characteristics. Data-to-Q propagation (D2Q) delay is defined as a sum of a setup time and a C2Q delay. In this paper, we build an objective function to minimize a D2Q delay, since (1) D2Q delay is composed of a setup time and C2Q delay thus we can consider both the delay parameters, and (2) D2Q delay correspond an overhead in sequential circuit design since the maximum allowable logic delay is restricted by the clock cycle time minus the D2Q delay [11]. In this paper, we do not consider hold time characteristics since hold time violation can be eliminated in a circuit design [12].

Figure 2 shows a schematic of the target TGFF circuit. To obtain a relationship between the D2Q delay characteristics and transistor on-currents under the process variation, we perform Monte-Carlo simulation. In the evaluation, we use a transistor model which considers the threshold voltage dependence on the transistor channel area to reflect the Pelgrom's model [13].

Our goal is to design a DFF circuit which has faster worst-case delay characteristics under process variation. Since transistor on-currents and circuit delay have an inverse relationship, it is difficult to build a regression function to estimate worst case delay under the transistor width tuning. We build a regression function which express relationship between operation speed (inverse of the D2Q delay) and transistor performances.

Figure 3 shows a simulation circuit for the DFF delay evaluation. The waveform generators composed with FO4



Fig. 3. A DFF simulation circuit.



its on-current as "transistors performance"

Fig. 4. Transistors current evaluation method.

loaded inverter cells generate data and clock signals for a target DFF. Each circuit contain parasitic RC from a layout to emulate realistic input slew and loading condition. Since D2Q delay strongly depends on a D2C delay, minimum values of D2Q delay for rise input case and fall input case are individually obtained changing the D2C delay for each Monte-Carlo trials.

Figure 4 shows a simulation circuit for the transistor performance evaluation. There are many candidates as a metrics of transistor performance, such as transistor width, threshold voltage, and on-current. We use static DC on-current as a metric of transistor performance, since the use of DC on-current is suitable to avoid multicollinearity problem in regression analysis. Pull-up and pull-down paths inside the DFF are divided, and these DC on-currents are individually evaluated. Stacked transistors are treated as single equivalent transistor and its on-current is used as variable to avoid the multicollinearity problem. Also, on-current of a PMOS transistor is used to represent the transmission gate and treated as one independent variable, since parallel connected transistors show similar characteristics to transistor width modulation and this leads the multicollinearity problem. This transformation reduces the independent variable from 24 transistors to 17 variables as initial set of independent variables in the regression analysis.

## III. REGRESSION FUNCTION FOR DFF DELAY EVALUATION

In this section, we obtain a regression function which estimates the DFF operation speed at the worst case condition under process variation. The regression function is used to estimate the DFF operation speed considering the transistor width tuning on the target DFF. We propose an equation to estimate mean delay and its standard deviation to estimate the worst case  $3\sigma$  delay under the transistor width tuning.

### A. Building the regression function for DFF circuit design

We create data set of DFF performances and transistor oncurrents using Monte-Carlo simulation, and build regression functions for both the D2Q rise and fall operations.

First, we build a normalized regression function to extract the performance-critical transistors. DFFs operation speed and transistor on-currents values have different units and mean values. Each value is normalized, and the following expression is obtained as normalized regression expression.

$$O_{\text{norm}} = K_{1,\text{norm}} I_{1,\text{norm}} + K_{2,\text{norm}} I_{2,\text{norm}} \cdots + K_{i,\text{norm}} I_{i,\text{norm}}, \quad (1)$$

where  $O_{\text{norm}}$  is a normalized value of an operation speed of the DFF,  $K_{j,\text{norm}}$  is a normalized regression coefficient and  $I_{j,\text{norm}}$  is the normalized value of on-current of the *j*th transistor, respectively.

If we assume the current of the *j*th transistor  $I_{j,norm}$  and DFF operation speed  $O_{norm}$  follows a Gaussian distribution, we can express the relationships of mean and standard deviation of operation speed and transistor on-currents as following,

$$\mu_{\text{O,norm}} = \sum_{j=1}^{l} (K_{\text{j,norm}} \cdot \mu_{\text{I,j,norm}})$$
(2)

$$\sigma_{\text{O,norm}} = \sqrt{\sum_{j=1}^{i} (K_{\text{j,norm}} \cdot \sigma_{\text{I,j,norm}})^2}, \quad (3)$$

where  $\mu_{O,norm}$  and  $\sigma_{O,norm}$  are the mean and the standard deviation of the normalized DFF operation speed, and  $\mu_{I,j,norm}$  and  $\sigma_{I,j,norm}$  are the mean and the standard deviation of the normalized *i*th transistors on-current, respectively.

In the regression analysis, there are some statistical metrics to evaluate the obtained regression function. When some independent variables shows larger p-value then we can eliminate these independent variables from regression function and build more simple and compact function. Adjusted R-squared is an index value how the obtained regression function well describes the input data set. Normalized regression analysis is also utilized to evaluate the impact of each transistor on DFF operation speed comparing the regression coefficients. Transistors with larger regression coefficients have larger impact on DFF operation. On the other hand, transistors with smaller regression coefficients have less impact on DFF operation thus we can eliminate these transistors form delay estimation in regression analysis.

Next, we build a regression function to estimate worst case operation speed of DFF. Operation speed and each on-current value can be expressed as following regression expression.

$$\mu_{\rm O} = \sum_{j=1}^{l} (K_{\rm j} \cdot \mu_{\rm I,j}) \tag{4}$$

$$\sigma_{\rm O} = \sqrt{\sum_{j=1}^{i} (K_j \cdot \sigma_{\rm I,j})^2}, \qquad (5)$$

where  $\mu_{\rm O}$  and  $\sigma_{\rm O}$  are the mean and the standard deviation of DFF operation speed, and  $\mu_{\rm I,j}$ ,  $\sigma_{\rm I,j}$  and  $K_j$  are the mean and the standard deviation of *j*th the transistors on-current, and its regression coefficient, respectively.

Then, we build an operation speed estimation function utilizing the equations (4) and (5). To consider the impact of transistor width tuning on both mean operation speed and its standard deviation, we use following equations to estimate both the mean operation speed and its standard deviation after the transistor width tuning.

$$\mu_{\text{I,j,pred.}} = \mu_{\text{I,j}} \frac{L_{\text{j,original}} W_{\text{j,pred.}}}{L_{\text{j,pred}} W_{\text{j,original}}}$$
(6)

$$\sigma_{\rm I,j,pred.} = \sigma_{\rm I,j} \sqrt{\frac{L_{\rm j,original} W_{\rm j,original}}{L_{\rm j,pred.} W_{\rm j,pred.}}}, \tag{7}$$

where  $\mu_{I,j,pred.}$  and  $\sigma_{I,j,pred.}$  are the predicted mean on-current and its standard deviation of the *j*th transistor after the width modulation.  $L_{j,pred.}$ ,  $L_{j,original}$ ,  $W_{j,pred.}$  and  $W_{j,original}$  are the length and width of the *j*th transistor before and after the transistor width modulation, respectively. The  $\sigma_{I,j,pred.}$ value itself is an estimated value calculated from transistor width dependence of the random variation based on Pelgrom's model.

### B. Objective function for transistor width tuning

Objective of this paper is to design a DFF with faster delay performance under process variation. The worst case operation speed ( $\mu$ -3 $\sigma$  point operation speed  $O_{\mu-3\sigma}$  in this paper) can be expressed utilizing the equations (4),(5),(6) and (7) as follows,

$$O_{\mu-3\sigma} = \mu_{\rm O,pred.} - 3\sigma_{\rm O,pred.} \tag{8}$$

$$= \sum_{j=1}^{i} K_{j} \cdot \mu_{j,\text{pred.}} - 3 \sqrt{\sum_{j=1}^{i} K_{j}^{2} \cdot \sigma_{j,\text{pred.}}^{2}}.$$
 (9)

Transistor widths are tuned to achieve DFF with better worst-case delay performance under process variation. If the performance-critical transistors are appropriately selected and their width are enlarged, transistor width tuning has an possibility to improve the DFFs worst-case operation speed. To maximize the operation speed for the both rise and fall input data cases, we minimize the objective function  $D_{\mu+3\sigma,obj}$  as follows,

$$D_{\mu+3\sigma,\text{obj}} = \sqrt{D_{\mu+3\sigma,\text{rise}}^2 + D_{\mu+3\sigma,\text{fall}}^2}, \qquad (10)$$

where assuming

$$D_{\mu+3\sigma,\text{rise}} = \frac{1}{O_{\mu-3\sigma,\text{rise}}}$$
(11)

$$D_{\mu+3\sigma,\text{fall}} = \frac{1}{O_{\mu-3\sigma,\text{fall}}}.$$
 (12)

 $O_{\mu-3\sigma,\text{rise}}$  and  $O_{\mu-3\sigma,\text{fall}}$  are the worst case rise and fall operation speed calculated using equation (9).

# *C.* Iteration of the regression analysis and the transistor width tuning

After transistor widths are updated, regression analysis is recursively applied for new DFF circuit. Some of the performance-critical transistors may not be a performancecritical after the width tuning, and the other transistors may become a performance-critical in next design. We need to iterate the regression analysis and transistor width tuning to improve the final DFF performance. Parasitic RC strongly affects to the DFF operation thus it is required to extract parasitics from layout information. If there are small modifications from previous trial, it is possible to reuse the simulation netlist with parasitics for the next trial updating the transistor widths and diffusion parasitic capacitance. If the large modifications are required, it is better to re-extract a netlist from a new layout. We partially utilize cell layout generator [14] to update the simulation netlist when the layout structure has been modified. This iteration requires Monte-Carlo simulation for each iteration and it requires large simulation cost. We evaluate the required number of iterations and compare the final result of DFF in next section.

### **IV. EXPERIMENTAL RESULTS**

# A. Implementation result

The target DFF in this paper is TGFF designed using 28 nm LP CMOS process. Nominal supply voltage for this process is 1.0 V. We target 0.5 V supply voltage for the evaluation of the DFF at the low voltage operation.

We use a data set of 300 trials of Monte-Carlo simulation and build a regression function of operation speed at both rise data input and fall data input cases. Transistor width tuning is performed to maximize a  $3\sigma$  point operation speed estimated based on the regression function. We recursively perform this process updating the DFF design. When this process finishes, we obtain the  $3\sigma$  point operation speed of the DFF designed based on the proposed regression analysis, and compare the result with the DFF designed by an experienced designer.

Utilizing the results of regression analysis, we tune the transistor widths inside the DFF to maximize its worst case operation speed. There are 17 transistors as independent variables for the regression function thus we select some transistors and enlarge the width to maximize the operation speed of the DFF utilizing the objective function in equation (10).

We start from a DFF (named DFF<sub>min</sub>) which is composed of transistors with minimum width as an initial DFF. Applying proposed DFF design methodology, we obtain a DFF (named DFF<sub>RA</sub>) which targets to maximize worst-case operation speed. Another DFF (named DFF<sub>manual</sub>) which is designed by experienced cell designer is used as competitor. We assume same total channel area for both DFF<sub>RA</sub> and DFF<sub>manual</sub> as a transistor width tuning constraint.

In the experiment, we use Synopsys HSPICE for the transistor level simulation and R for the regression analysis. In the regression analysis, we check p-values of each independent variable and eliminate some independent variables from regression function which show larger than 0.1% estimation error.



Fig. 5. Comparison result of Monte-Carlo results and estimations from regression analysis at each interval (20 iteration).



Fig. 6. Design result with different number of iterations.

# B. Design examples of DFFs with proposed design flow

Figure 5 shows evaluation result of the worst-case operation speeds of DFFs for rise and fall data inputs, which are obtained by Monte-Carlo simulations and estimated by regression analvsis. In the proposed design flow, transistor widths are updated recursively. We perform 20 trials to improve the worst-case operation speed while the total channel area inside the DFF is less than a constraint value. Monte-Carlo simulation is performed for initial DFF and worst case delay is obtained (MC0). Then, regression analysis estimates worst case delay after the transistor width tuning (RA1). In the next trial, Monte-Carlo result (MC1) does not reach to the estimated point (RA1), since the performance-critical transistors are changed due to the transistor width tuning at first trial. However, Monte-Carlo simulation result shows both the rise and fall operation speeds are improved constantly, since regression analysis succeed to extract the performance-critical transistors at each trial.

Figure 6 shows evaluation result of the worst-case operation speed of DFFs with different number of iterations. Total channel area constraint is constant, thus each DFF has different amount of additional channel area for transistor width tuning in each trials. If we use only one trial (assign all channel area constraint to initial result), operation speed degraded from original DFF. As the number of trials increased, the final result of DFFs show better operation speed. In this case, more than 10 iterations show almost the same performance.

## C. Comparison result

Table I shows the transistor widths of DFFs.  $DFF_{manual}$  is designed by experienced designer which targets 0.5 V operation.  $DFF_{RA,0.5 V}$  is designed with proposed design method targets 0.5 V operation.  $DFF_{RA,1.0 V}$  is designed with proposed design method but its target supply voltage is 1.0 V.  $DFF_{RA,1.0 V}$  is a design example how supply voltage difference affects to transistor width tuning result. Two DFFs designed by the proposed method uses 20 times iteration in the design flow, and these have almost same total channel area as  $DFF_{manual}$ . In the table, averaged value is used for series connected transistor widths. Result shows  $DFF_{RA,0.5 V}$  has almost same set of transistor widths as  $DFF_{manual}$ . On the other hand,  $DFF_{RA,1.0 V}$  shows different set of transistor widths. This result shows the optimal set of transistor widths varies depending on its supply voltages.

Table II shows the cell layout height and width of each DFF. Note that each cell size is normalized by the library basic cell (unit cell). Since there is no cell area constraint in the proposed design flow and objective function, some transistors requires two-finger structure thus it requires larger cell width compare to DFF<sub>manual</sub>.

We perform 1000 trials of Monte-Carlo simulation and evaluate the mean D2Q delay and standard deviation. Table III summarizes the final values of objective function, mean and standard deviation of the D2Q delay, and the calculated worst

| Tr. id#                        | DFF <sub>min</sub> |       | DFFmanual |       | DFF <sub>RA.0.5V</sub> |       | DFF <sub>RA.1.0</sub> V |       | Description                   |  |
|--------------------------------|--------------------|-------|-----------|-------|------------------------|-------|-------------------------|-------|-------------------------------|--|
|                                |                    |       |           |       | (iteration:20)         |       | (iteration:20)          |       | *                             |  |
|                                | PMOS               | NMOS  | PMOS      | NMOS  | PMOS                   | NMOS  | PMOS                    | NMOS  |                               |  |
| mp1/mn1                        | 80                 | 80    | 270       | 180   | 270                    | 155   | 115                     | 80    | Input clocked inverter        |  |
| mp2/mn2                        | 80                 | 80    | 300       | 200   | 320                    | 220   | 240                     | 200   | Master-latch inverter         |  |
| mp3/mn3                        | 80                 | 80    | 80        | 80    | 80                     | 80    | 80                      | 80    | Master-latch clocked inverter |  |
| mp4/mn4                        | 80                 | 80    | 180       | 120   | 290                    | 90    | 290                     | 90    | Transmission gate             |  |
| mp5/mn5                        | 80                 | 80    | 300       | 200   | 290                    | 200   | 270                     | 320   | Slave-latch inverter          |  |
| mp6/mn6                        | 80                 | 80    | 80        | 80    | 80                     | 80    | 80                      | 80    | Slave-latch clocked inverter  |  |
| mp7/mn7                        | 80                 | 80    | 360       | 240   | 420                    | 260   | 590                     | 280   | Output inverter               |  |
| mp8/mn8                        | 80                 | 80    | 240       | 160   | 80                     | 160   | 80                      | 320   | 1st stage clock buffer        |  |
| mp9/mn9                        | 80                 | 80    | 240       | 160   | 220                    | 80    | 290                     | 80    | 2st stage clock buffer        |  |
| Total channel area $[\mu m^2]$ |                    | 0.058 |           | 0.126 |                        | 0.122 |                         | 0.124 |                               |  |

TABLE I Transistor widths of DFFs. [nm]



Fig. 7. Comparison result of worst-case rise and fall delay.

| TABLE II                |                |                  |  |  |  |  |  |
|-------------------------|----------------|------------------|--|--|--|--|--|
| Cell size of DFFs       |                |                  |  |  |  |  |  |
| Cell                    | Height         | Width            |  |  |  |  |  |
|                         | [# of tracks ] | [# of unit cell] |  |  |  |  |  |
| DFF <sub>min</sub>      | 9              | 17               |  |  |  |  |  |
| DFF <sub>manual</sub>   | 9              | 17               |  |  |  |  |  |
| DFF <sub>RA,0.5</sub> v | 9              | 18               |  |  |  |  |  |
| DFF <sub>RA,1.0</sub> V | 9              | 19               |  |  |  |  |  |

case delay at the  $3\sigma$  variation point. Objective function of DFF<sub>RA,0.5 V</sub> achieves 7.2% smaller value than that of DFF<sub>manual</sub> at 0.5 V operation. Figure 7 shows the  $3\sigma$  point worst case rise and fall delay. Result shows DFF<sub>RA,0.5 V</sub> achieves 14% faster  $3\sigma$  rise delay than DFF<sub>manual</sub>, at the cost of 3.0% slower  $3\sigma$  point fall delay. DFF<sub>RA,0.5 V</sub> shows 10% faster rise slew and 2.9% faster fall slew than that of DFF<sub>manual</sub>, since  $DFF_{RA,0.5 V}$  has wider transistors for output inverter. Enlarging the transistor widths increases gate capacitances and diffusion capacitances thus it increase circuit operation energy. DFF<sub>RA,0.5 V</sub> and DFF<sub>manual</sub> consumes almost twice energy than DFF<sub>min</sub>, but there are less difference in energy consumption between DFF<sub>RA,0.5 V</sub> and DFF<sub>manual</sub>, since total channel areas are almost same. Results show proposed design method can generate a DFF which has almost equivalent performance as a DFF which is designed by an experienced cell designer.

# V. CONCLUSION

This paper discuss a design methodology for variation aware DFF, using regression analysis to express the DFF operation speed using the on-current of transistors inside the DFF. DFF circuit is carefully hand-crafted by an experienced cell designer, however there are not enough discussion that how to design or tune the transistor size inside a DFF to achieve faster delay performance. Regression analysis not only identifies the transistors which strongly affect to DFF delay characteristics, but also shows its amount of impact on delay in quantitative form. Proposed design methodology is verified via DFF design experiment, and result shows the DFF designed with proposed methodology has similar or faster delay characteristics as a DFF designed by an experienced cell designer.

Our future work is to consider both energy consumption and area constraint into objective function to achieve energy efficient, compact and faster delay characteristics for DFF circuit considering process variation effect.

TABLE III Delay and energy performance of each cell.

|                                | DF    | Fmin  | DFFn    | nanual | DFF <sub>RA,0.5</sub> V |        |
|--------------------------------|-------|-------|---------|--------|-------------------------|--------|
| Obj. func. norm.               |       |       |         |        |                         |        |
| by DFF <sub>manual</sub>       |       | 263   | 100     |        | 92.8                    |        |
|                                | Rise  | Fall  | Rise    | Fall   | Rise                    | Fall   |
| D2Q (µ) [ns]                   | 4.14  | 3.54  | 1.45    | 1.16   | 1.20                    | 1.20   |
| D2Q $(\sigma)$ [ns]            | 0.432 | 0.423 | 0.236   | 0.185  | 0.217                   | 0.187  |
| D2Q ( $\mu$ +3 $\sigma$ ) [ns] | 5.43  | 4.81  | 2.16    | 1.71   | 1.86                    | 1.76   |
| $C2Q(\mu)$ [ns]                | 4.12  | 3.56  | 1.46    | 1.16   | 1.21                    | 1.20   |
| D2C $(\mu)$ [ns]               | 0.250 | 0.394 | -0.0743 | -0.287 | -0.0902                 | -0.263 |
| Output slew $(\mu)$ [ns]       | 3.61  | 1.18  | 0.439   | 0.414  | 0.395                   | 0.402  |
| Energy [µJ]                    | 1.05  | 0.789 | 2.05    | 2.10   | 2.10                    | 2.04   |

#### ACKNOWLEDGMENTS

This work has been partly supported by JSPS KAKENHI JP16H01713 and JP17K12657. This work is also partly supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc., Cadence Design Systems, Inc., and Menter Graphics, Inc.

### References

- M. Alioto, "Understanding DC Behavior of Subthreshold CMOS Logic Through Closed-Form Analysis," *IEEE Trans. on Circuits and Systems I*, vol. 57, no. 7, pp. 1597–1607, Jul 2010.
- M. Alioto, E. Consoli, and G. Palumbo, "Analysis and Comparison in the Energy-Delay-Area Domain of Nanometer CMOS Flip-Flops: Part I-methodology and design strategies," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 5, pp. 725–736, 2011.
  M. Alioto, E. Consoli, and G. Palumbo, "Analysis and Comparison in
- [3] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and Comparison in the Energy-Delay-Area Domain of Nanometer CMOS Flip-Flops : Part II - Results and Figures of Merit," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 5, pp. 737–750, 2011.
- [4] A. Asenov, S. Kaya, and A. R. Brown, "Intrinsic Parameter Fluctuations in Decananometer MOSFETs Introduced by Gate Line Edge Roughness," *IEEE Trans. on Electron Devices*, vol. 50, no. 5, pp. 1254–1260, may 2003.
- [5] K. J. Kuhn, M. D. Giles, D. Becher, P. Kolar, A. Kornfeld, R. Kotlyar, S. T. Ma, A. Maheshwari, and S. Mudanai, "Process Technology Variation," *IEEE Trans. on Electron Devices*, vol. 58, no. 8, pp. 2197– 2208, aug 2011.
- [6] H. Sunagawa and H. Onodera, "Variation-tolerant Design of D-FlipFlops," in *Intl. SOC Conference*, Sep 2010, pp. 147–151.
- [7] S. Nishizawa, T. Ishihara, and H. Onodera, "Design Methodology of Process Variation Tolerant D-Flip-Flops for Low Voltage Circuit Operation," in *Intl. SOC Conference*, 2014, pp. 42–47.
- [8] J. Moon, M. Aktan, and V. G. Oklobdzija, "Clocked Storage Elements Robust to Process Variations," in *Intl. Conference on ASIC*, Oct 2009, pp.827–830.
- [9] M. Lanuzza, R. De Rose, F. Frustaci, S. Perri, and P. Corsonello, "Impact of Process Variations on Flip-Flops Energy and Timing Characteristics," in *Annual Symposium on VLSI*, Jul 2010, pp. 458–459.
  [10] S. A. Sadrossadat, H. Mostafa, and M. Anis, "Statistical Design Frame-
- [10] S. A. Sadrossadat, H. Mostafa, and M. Anis, "Statistical Design Framework of Submicron Flip-Flop Circuits Considering Process Variations," *IEEE Trans. on Semiconductor Manufacturing*, vol. 24, no. 1, pp. 69–79, 2011.
- [11] N. H. E. Weste and D. M. Harris, CMOS VLSI Design (4th edition). Addison Wesley, 2010.
- [12] J. R. Tolbert, X. Zhao, S. K. Lim, and S. Mukhopadhyay, "Slew-aware clock tree design for reliable subthreshold circuits," in *Intl. Symposium* on Low Power Electronics and Design, 2009, pp. 15–20.
- [13] M. Pelgrom, A. Duinmaijer, and A. Welbers, "Matching Properties of MOS Transistors," *IEEE Journal of Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, oct 1989.
- [14] S. Nishizawa, T. Ishihara, and H. Onodera, "Layout Generator with Flexible Grid Assignment for Area Efficient Standard Cell," *IPSJ Trans.* on System LSI Design Methodology, vol. 8, pp. 131–135, 2015.