# Statistical Noise Margin Estimation for Sub-Threshold Combinational Circuits

Yu Pu<sup>1,2,3</sup>, José Pineda de Gyvez<sup>1,2</sup>, Henk Corporaal<sup>1</sup> and Yajun Ha<sup>3</sup> Technische Universiteit Eindhoven<sup>1</sup>, The Netherlands NXP Research Eindhoven<sup>2</sup>, The Netherlands National University of Singapore<sup>3</sup>, Singapore

Abstract - The increasingly popular sub-threshold design is strongly calling for EDA support to estimate noise margins, minimum functional supply voltage, as well as the functional yield. In this paper, we propose a fast, accurate and statistical approach to accomplish these goals. First, we derive close-form functions based on a new equivalent resistance model which enables the fast estimation of noise margins of individual cells at the gate-level. Second, we propose to calculate and propagate the noise margin information with an affine arithmetic model that takes into account process variations and correspondent inter-cell correlations. Experiments with ISCAS benchmarks have shown that the new approach has an accuracy of 98.5% w.r.t. transistor-level Monte Carlo simulations. The running time per input vector of the new approach only needs a few seconds, in contrast to the many hours required by transistor-level DC Monte-Carlo simulations. To the best of our knowledge, we are the first to provide a fast, accurate and statistical methodology other than Monte-Carlo simulation for the noise margin estimation of sub-threshold combinational circuits.

## I. INTRODUCTION

Sub-threshold design is gaining more attention as it provides a very promising solution for ultra-low power applications such as wireless sensors, in-vivo biomedical implants, etc. When designs are moving from the super-threshold to the sub-threshold domain, the effective-to-idle current ratio  $(I_{eff}/I_{idle})$  [1] diminishes rapidly. Accordingly, the available noise margin is reduced, which may lead to a failure of the decoding logic values. Manufacturing variability further worsens circuit robustness. Therefore, guaranteeing sufficient output noise margins becomes a unique and important issue for sub-threshold designs. Prior art [2]-[5] relies on device sizing as a means of ensuring enough noise margins for individual cells. This is because larger devices reduce the threshold voltage  $(V_T)$  mismatch [7]. This methodology neglects correlations between gates and results in a pessimistic estimation of the outputs noise margin. For instance, a gate that outputs higher  $V_{OL}$  (lower  $V_{OH}$ ) can tolerate higher  $V_{OL}$ (lower V<sub>OH</sub>) from its preceding gate. Ignoring inter-cell correlations results in an overestimation of the minimum power supply V<sub>DD</sub> and device sizes, thus an increase of power consumption. Using Monte-Carlo DC simulations to extract the noise margin can solve the problems. Based on the extracted noise margin information, the designer can improve the robustness of the circuitry by means such as gate resizing, buffer insertion, logic restructuring and etc. In this way, the imposed additional area and power overhead are prevented. However, this is at the cost of a much longer design time. Usually, the design flow requires multiple iterations between noise margin extraction and circuit tuning. In our experience, spending tens of hours to extract the noise margins of a benchmark circuit composed of thousands of logic gates is quite common. Therefore, exploring an approach that can promptly estimate the noise margin, minimum functional  $V_{DD}$ and the functional yield for a given circuit taking into consideration the impact of process variations and inter-cell correlations is of great importance.

We propose a novel noise margin estimation methodology for sub-threshold combinational circuits in this paper. Our methodology has the following features. First, instead of performing slow transistor-level DC simulations, we propose a fast gate-level noise margin modeling approach based on a new equivalent resistance model. We use curve-fitting to calibrate our model, so that the estimation results can perfectly match the results simulated from transistor level DC simulations. In analogy to the Elmore delay model for timing analysis, the gate-level model renders reasonably good accuracy, but is much more computationally efficient compared to its transistor-level counterpart. Second, we introduce the Affine Arithmetic (AA) approach to symbolically traverse the whole circuit from its inputs to outputs. Applying AA helps to model correlations of noise margins among cells. Besides, as the noise margins of the final outputs are expressed in the affine form, their statistical spread can easily be extracted. In this way, the functional yield can also be estimated. Our approach iterates only once per one input vector, hence the running time can be reduced by several orders compared to the Monte-Carlo simulation as the algorithm complexity is reduced from  $O(a^n)$  to  $O(n^{1/k})$  (k>1).

The paper is organized as follows. Section II gives the gate-level noise margin model and shows how to calibrate it to improve the estimation accuracy. Section III introduces the *affine arithmetic* model to propagate results and to estimate statistical output noise margins and functional yield. Experimental results are shown in Section IV. Finally, Section V draws the conclusion of the work.

## II. ESTIMATING GATE NOISE MARGIN WITH RECTIFIED EQUIVALENT RESISTANCE MODEL

The BSIM models of the sub-threshold current for nMOS and

pMOS transistors are given in Equations (1)-(3) by [1]

$$I_{sub\_n} = I_{0n} e^{\frac{(V_{CS} - V_{Tn} + \eta V_{DS-7} V_{SB})}{nU}} (1 - e^{-\frac{V_{DS}}{U}})$$
(1)

$$I_{sub_{-}p} = I_{0p} e^{\frac{-(V_{CS} - V_{Tp} + \eta V_{DS_{-}T}V_{SB})}{nU}} (1 - e^{\frac{V_{DS}}{U}})$$
(2)

$$I_{0n(p)} = (W_{eff} / L_{eff})(n-1)\mu C_{ox}U^{2}$$
(3)

where I<sub>0</sub> is the saturation current,  $\eta$  is the DIBL coefficient,  $\gamma$  is the linearized body effect factor, *U* is the so-called thermal voltage kT/q which is around 26mV at room temperature, and *n* is the sub-threshold swing factor.

To estimate the noise margin of a cell at gate-level, we introduce an *equivalent resistance* model into the DC analysis. The resistance is referring to the derivative of the drain-to-source voltage  $V_{DS}$ , with respect to the drain-to-source current  $I_{sub}$ , at the DC point  $V_{DS} = 0$ . Ignoring for the moment body effects, we can approximate the equivalent resistances of nMOS and pMOS transistors as indicated in Equations (4) and (5) respectively,

$$R_{nMOS} = (I_{0n})^{-1} U e^{-(Vin - VT_n)/nU}$$
(4)

$$R_{pMOS} = (I_{0p})^{-1} U e^{(Vin - V_{DD} - VTp)/nU}$$
(5)

A typical digital cell consists of a p-section with a common node tied to an n-section (see Fig.1(a)). Let us start the analysis with a CMOS inverter (Fig. 1(b)). Its equivalent resistance model is shown in Fig. 1(c).



Fig. 1 (a) Cell Schematic (b) Inverter (c) Equivalent Model

Assuming  $I_{0n} = I_{0p}$ , we can obtain the output voltage of the inverter,

$$V_{out} = \left\{ 1 + e^{[2Vin - (V_{DD} + VT_n + VT_p)]/nU} \right\}^{-1} V_{DD}$$
(6)

If we define

 $X = (V_{Tn} + V_{Tp})/2$ 

Then Equation (6) can be further expressed as Equation (8)

$$V_{out} = \left\{ 1 + \left[ e^{(Vin - X - V_{DD}/2)/nU} \right]^2 \right\}^{-1} V_{DD}$$
(8)

The above analysis may lose veracity as we neglected the body effect and assumed  $I_{0n} = I_{0p}$ . To fix the accuracy, we intentionally insert a parameter  $\lambda$  into (8) for calibration,

$$V_{out} = \left\{ 1 + \left[ e^{\lambda + (Vin - X - V_{DD}/2)/nU} \right]^2 \right\}^{-1} V_{DD}$$
(9)

where  $\lambda$  can be extracted through *nonlinear least square curve-fitting* from actual simulated results. Fig. 2 gives the noise margin estimated from Cadence Spectre Simulator and from Equation (9), for an inverter with *Wp/Wn*=0.28µm/0.2µm in 65nm CMOS process under typical technology (TT) when V<sub>DD</sub> is swept in the sub-threshold region. By definition, the  $V_{OL}$  and  $V_{OH}$  are the two operational points of the inverter where  $dV_{out}/dV_{in} = -1$ . Please note that the vertical axes have different scales for each plot. As shown, both results perfectly match each other after curve-fitting.



Next, we show how to incorporate process variations in our model. Our previous research [6] has proven that  $V_T$  variation is the dominant malefactor for the sub-threshold noise margin due to its exponential correlation with the sub-threshold current. The V<sub>T</sub> mismatch of paired transistors also causes a wide range of sub-threshold current shifts [7]. In our model, the  $V_T$  variation is reflected on the variation of X. As  $V_{Tn}$  and  $V_{Tp}$  are normally distributed, X is also normally distributed, i.e.,  $X \sim N(\mu_x \sigma_x^2)$ . Parameters  $\mu_x$  and  $\sigma_x$  are primarily dependent on the size of the transistors, and can also be characterized from transistor level simulations. Fig. 3 shows the  $3\sigma$  range of  $V_{\text{OL}}$  and  $V_{\text{OH}}$  obtained from Cadence Spectre Simulator and from our model. Once again, the results simulated from the transistor level model and our new model perfectly coincide with each other. An observation from the two plots is that the variation of  $V_{OH}$  is much larger than that of  $V_{OL}$  due to the fact that the nMOS transistor is much more leaky than the pMOS transistor.



A similar analysis can be carried out for other static digital gates. For an N-input gate, we found that its output voltage can be expressed as a function as shown in Equation (10),

 $V_{out} = f(V_{in}, X, V_{DD})$  (10) where  $V_{in}$  denotes the set of N inputs' voltages, and X is the set that contains N normally distributed variables corresponding to the different inputs. For example, the output voltages of an N-input NAND and an N-input NOR gate can be expressed in Equations (11) and (12), respectively,

$$V_{out} = \left\{ 1 + \left[ \sum_{i=1}^{N} \left[ e^{\lambda i - (Vin_{-}i - Xi - VDD/2)/nU} \right] \right]^{-2} \right\}^{-1} V_{DD}$$
(11)

(7)

$$V_{out} = \left\{ 1 + \left[ \sum_{i=1}^{N} \left[ e^{\lambda i + (Vin_{-}i - Xi - V_{DD}/2)/nU} \right] \right]^2 \right\}^{-1} V_{DD}$$
(12)

where  $V_{in\_i}$  is the voltage of the  $i^{th}$  input and  $V_{in\_i} \in V_{in}$ ,  $X_i$  is the set of inputs with  $X_i \in X$  and  $X_i \sim N(\mu_{x\_i}, \sigma_{x\_i}^2)$ ,  $\lambda_i$  is the  $i^{th}$  fitted parameter. The noise margin model for each type of gate, including the pre-characterized constants  $\mu_{x\_i}$ ,  $\sigma_{x\_i}$ ,  $\lambda_i$  ( $\forall i$ ), can be embedded in a library file of our EDA tool.

Estimating the cell's noise margin with its equivalent resistance model renders reasonably good accuracy, and provides a much simpler expression when compared to the transistor-level model. The new noise margin model performs well at the gate-level, and avoids the need for solving any high order differential equation matrix, hence tremendously reduces the computation intensity for the EDA software. However, if the statistical noise margins at the outputs are to be extracted, Monte-Carlo DC analysis is still needed. To totally eliminate using Monte-Carlo simulations, we introduce the *Affine Arithmetic* model for efficient computation and propagation of noise margins, as will be discussed in the next section.

## III. ESTIMATING STATISTICAL OUTPUT NOISE MARGIN WITH AFFINE ARITHMETIC MODEL

Affine Arithmetic (AA) is a model used for example in bit-width estimation and probabilistic error analysis ([8]-[11]). In the AA model, an uncertain variable x is expressed as

$$x = C_0 + \sum_{i=1}^{N} C_i \varepsilon_i$$
(13)

where  $C_0$  is the *central value* of the affine form of x,  $\varepsilon_i$  is an independent noise symbol multiplied by its corresponding coefficient C<sub>i</sub>. All noise symbols denote independent and identically-distributed variables. AA is very suitable for symbolic propagation. This is because if the operands are in AA form, the results of the arithmetic operations, such as addition, subtraction, multiplication, are also in AA form. Furthermore, AA is capable of carrying correlation information. Along a propagation path, one noise symbol  $\varepsilon_i$ may contribute to the uncertainties of two or more variables. When these variables are combined, the uncertainties may also be combined so that their correlations are taken into consideration. This property is especially useful for our case. As shown in Fig. 4, the variation term  $\varepsilon_i$  in the noise margin expression at the output of INV1, will re-converge at the inputs of NAND1, and will proceed to the output of NAND1. In this way, the final results can be more accurately estimated.



Fig. 4. Noise margin uncertainty propagation with AA model

Fig. 5 shows the statistical noise margin estimation flowchart of this work. The new approach takes 3 steps:



Fig. 5. Noise margin estimation flowchart

#### 1. Model Instantiation & AA form Initialization

Given the synthesized gate-level netlist, we instantiate each gate with the noise margin model described in Section II. Each parameter in X is initialized and stored in AA form, i.e.

$$X_{i,k} = X_{0i,k} + C_{i,k} \varepsilon_{i,k}$$
 (14)  
where  $X_{i,k}$  denotes the  $i^{th}$  variable in the set  $X$  of the  $k^{th}$  gate.  
 $\varepsilon_{i,k}$  is a unique and independent noise symbol associated with  
that variable and  $\varepsilon_{i,k} \sim N(0, 1)$ .

#### 2. Symbolical Calculation and Propagation

For each input-vector, the program traverses the whole circuit from the inputs to the outputs in the *forward* direction, such that the voltage of each edge in the graph is annotated with a calculation result expressed in *AA* form. However, symbolical propagation would cause a range explosion when encountering special functions such as *exponential* and/or *power functions*, resulting in difficulty to maintain AA propagation. We solve this problem by approximating (10) linearly using a first order Taylor expansion, so that the output voltage of each gate is expressed as

$$V_{out} = V_{out0} + \sum_{\forall i} \left[ \partial f / \partial V_{in_i} \right]_0 \Delta V_{in_i} + \sum_{\forall i} \left[ \partial f / \partial X_i \right]_0 \Delta X_i$$
(15)

where  $V_{out0}$ ,  $[\partial f / \partial V_{in_i}]_0$ ,  $[\partial f / \partial X_i]_0$  are the values calculated at the *central values* of the variables in the  $V_{in}$  and X sets of that gate.

#### 3. Output Noise Margin Estimation

After calculation and propagation, the voltage at the output (s) can be expressed as (16)

$$V_{output} = V_{output} + \sum_{\forall (i,k)} \eta_{i,k} \varepsilon_{i,k}$$
(16)

Recall that each  $\varepsilon_{i,k}$  in (16) is an independent *noise symbol* and  $\varepsilon_{i,k} \sim N(0,1)$ .  $\eta_{i,k}$  is the corresponding accumulated coefficient. According to probability theory, the sum of these independent normally distributed terms is also normally distributed, so we can have

$$V_{out} \sim N(V_{out0}, \sum_{\forall (i,k)} \eta_{i,k}^{2})$$
(17)

Therefore, the mean value and variance of the output voltage can be easily obtained such that the statistical output noise margin can be easily estimated.

## IV. EXPERIMENTAL RESULTS

To prove the strength of our methodology, experiments have been conducted using the ISCAS combinational benchmark circuits. All simulations were performed for a CMOS 65nm Standard  $V_T$  (SVT) technology from NXP. The benchmark circuits are synthesized to netlists with minimum size logic gates. We do not use gates that have more than 4 stacked transistors or 4 paralleled transistors, as sub-threshold design seldom exploits these gates due to severe robustness degradation [4]. Our new approach was implemented in C++, and ran on a PC with Intel Pentium 1.86GHz and 1G RAM. To validate the new model, we performed transistor-level DC Monte-Carlo simulations for benchmark C880, and compared the results with those from our approach. The MC simulation was carried out with Cadence Spectre Simulator running on a HP UNIX server. The simulations ran for 2000 trials. Table I gives the simulation results. Here,  $V_{OL}$  ( $V_{OH}$ ) is defined as the maximum (minimum) value among all the outputs  $3\sigma V_{OL}$ (V<sub>OH</sub>), normalized w.r.t. V<sub>DD</sub>. As shown, our approach can model the output noise margin with less than 1.5% deviation. However, the transistor-level DC MC simulation for benchmark C880 required more than 10 hours running time for one input vector, while the new approach only needed about 0.1 seconds! Our methodology reduces the design time for the output noise margin of a circuit by several orders of magnitude.

#### TABLE I

Estimated 3σ Statistical Noise Margin from Cadence Spectre Monte-Carlo DC Simulation and the New Approach

| Bench-<br>mark | Sim | 150mV      |                   | 180mV             |                   | 210mV             |            | Running              |
|----------------|-----|------------|-------------------|-------------------|-------------------|-------------------|------------|----------------------|
|                |     | $V_{OL}$ ' | V <sub>OH</sub> ' | V <sub>OL</sub> ' | V <sub>OH</sub> ' | V <sub>OL</sub> ' | $V_{OH}$ ' | Time/Input<br>Vector |
|                | MC  | 2.4%       | 84.6%             | 1.2%              | 92.2%             | 0.3%              | 96.2%      | > 10 hours           |
| C880           | New | 2.9%       | 85.4%             | 1.1%              | 93.7%             | 0.4%              | 97.4%      | 0.08sec              |

Table II gives the  $3\sigma$  and  $6\sigma$  statistical noise margins simulated with our methodology for the remaining ISCAS benchmarks. If targeting at ensuring sufficient noise margin for each individual gate, the required minimum V<sub>DD</sub> is 220mV. However, observe that at V<sub>DD</sub>=180mV, there is also enough  $3\sigma$ noise margin (e.g., V<sub>OH</sub> > 90%V<sub>DD</sub> and V<sub>OL</sub> <10%V<sub>DD</sub>). The overestimation of minimum functional voltage V<sub>DD</sub> is thus avoided, as the new approach can precisely estimate the output noise margins.

| Bench- |    | 15         | 0mV 180    |                   | mV                | 210        | mV         | Running   |  |  |  |
|--------|----|------------|------------|-------------------|-------------------|------------|------------|-----------|--|--|--|
| mark   |    | $V_{OL}$ ' | $V_{OH}$ ' | V <sub>OL</sub> ' | V <sub>OH</sub> ' | $V_{OL}$ ' | $V_{OH}$ ' | Time(sec) |  |  |  |
| C1355  | 3σ | 2.5%       | 85.0%      | 1.8%              | 93.7%             | 0.3%       | 97.4%      | 0.172     |  |  |  |
|        | 6σ | 4.1%       | 73.2%      | 2.6%              | 88.8%             | 0.68%      | 95.4%      |           |  |  |  |
| C1908  | 3σ | 2.4%       | 78.3%      | 1.7%              | 92.6%             | 0.4%       | 97.2%      | 0.204     |  |  |  |
|        | 6σ | 4.3%       | 61.1%      | 2.3%              | 86.8%             | 0.7%       | 95.0%      |           |  |  |  |
| C2670  | 3σ | 3.0%       | 83.3%      | 1.2%              | 91.3%             | 0.4%       | 97.4%      | 0.484     |  |  |  |
|        | 6σ | 8.0%       | 70.1%      | 2.0%              | 86.7%             | 0.73%      | 95.0%      |           |  |  |  |
| C3540  | 3σ | 3.4%       | 85.1%      | 1.1%              | 91.8%             | 0.4%       | 97.4%      | 0.688     |  |  |  |
|        | 6σ | 6.2%       | 73.3%      | 1.95%             | 88.4%             | 0.68%      | 95.4%      |           |  |  |  |
| C5315  | 3σ | 3.5%       | 77.2%      | 1.1%              | 92.6%             | 0.4%       | 97.2%      | 1.203     |  |  |  |
|        | 6σ | 6.4%       | 59.4%      | 1.95%             | 88.9%             | 0.73%      | 95.1%      |           |  |  |  |
| C6288  | 3σ | 7.1%       | 78.9%      | 2.4%              | 92.7%             | 0.8%       | 97.2%      | 1.422     |  |  |  |
|        | 6σ | 13.0%      | 62.2%      | 4.38%             | 86.9%             | 1.63%      | 95.0%      |           |  |  |  |
| C7552  | 3σ | 2.7%       | 78.4%      | 1.1%              | 92.7%             | 0.4%       | 97.4%      | 1.781     |  |  |  |
|        | 6σ | 4.8%       | 61.2%      | 2.1%              | 86.8%             | 0.74%      | 95.1%      |           |  |  |  |

TABLE II Estimated 3g and 6g Statistical Noise Margins as % of Vpp

Based on the spread of noise margin, we are now able to estimate the circuit's functional yield for a given  $V_{DD}$ . Let us take benchmark C880 at  $V_{DD} = 180 \text{mV}$  as an example. Its  $V_{OH}$  and  $V_{OL}$  probability density function plots, which are generated with the  $\mu$  and  $\sigma$  estimated by our programme, are shown in Fig. 6. By intersecting a 90%  $V_{DD}$  line for  $V_{OH}$  and a

10%  $V_{DD}$  line for  $V_{OL}$ , the desired  $V_{OH}$  and  $V_{OL}$  ranges (shadow region) are obtained. Suppose  $p_1$  and  $p_2$  are the cumulative density within the acceptable ranges and neglecting the dependency between  $V_{OL}$  and  $V_{OH}$ ,  $p=p_1p_2$  is an estimation for the functional yield. In this example, p is 99.8%. This represents a 2000 ppm loss arising from malfunctioning of combinational circuits, excluding the timing yield loss. Obviously, it is very high for industrial design standards.



#### V. CONCLUSION

This paper introduced a novel noise margin extraction methodology for sub-threshold combinational circuits. Based on an equivalent resistance model, we model the noise margin of individual cells at the gate-level instead of at the transistor-level. We also introduced the *Affine Arithmetic* model to efficiently calculate, propagate and estimate the output statistical noise margin, minimum functional V<sub>DD</sub>, as well as the functional yield of a circuit. Experimental results show that our approach has 98.5% accuracy using MC simulations as a reference, but can remarkably reduce the running time by several orders of magnitude.

#### REFERENCE

- Benton H. Calhoun, Anantha P. Chandrakasan, "Characterizing and Modeling Minimum Energy Operation for Subthreshold Circuits," *International Symposium on Low Power Electronics and Design (ISLPED)*, August 2004.
- [2] Benton H.Calhoun, Alice Wang, and Anantha Chandrakasan, "Modeling and Sizing for Minimum Energy Operation in Subthreshold Circuits," *IEEE Journal of Solid-State Circuits (JSSC)*, Vol.40, No.9, September 2005
- [3] Benton H. Calhoun, Alice Wang, Anantha P. Chandrakasan, "Device Sizing for Minimum Energy Operation in Subthreshold Circuits," in *Proceedings of IEEE Custom Integrated Circuits Conference (CICC)*, October 2004.
- [4] Alice Wang, and Anantha Chandrakasan, "A 180-mV Subthreshold FFT Processor Using a Minimum Energy Design Methodology" *IEEE Journal of Solid-State Circuits (JSSC)*, Vol.40, No.1, January 2005
- [5] Joyce Kwong, Anantha P. Chandrakasan, "Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits," *International Symposium on Low Power Electronics and Design (ISLPED)*, October 2006
- [6] Yu Pu, José Pineda de Gyvez, Henk Corporaal and Yajun Ha, "V<sub>T</sub> Balancing and Device Sizing Towards High Yield of Sub-threshold Static Logic Gates," International Symposium on Low Power Electronics and Design (ISLPED), August 2007
- [7] José Pineda de Gyvez and Hans P.Tuinhout, "Threshold Voltage Mismatch and Intra-Die Leakage Current in Digital CMOS Circuits," *IEEE Journal of Solid-State Circuits (JSSC)*, vol.39, No.1, pp.157–168, January, 2004
- [8] C. F. Fang and R. A. Rutenbar and M.Puschel and T. Chen, "Towards efficient static analysis of finite precision effects in DSP applications via affine arithmetic modeling," *Design Automation Conference*, 2003.
- [9] C. F. Fang, R. A. Rutenbar, M. Puschel and T. Chen, "Fast, accurate static analysis for fixed-point finite-precision effects in DSP designs," *Proceedings of the International Conference on Computer Aided Design* (ICCAD'03).
- [10] D-U. Lee, A. A. Gaffar, O. Mencer, W. Luk, "MiniBit: bitwidth optimization via affine arithmetic," Proceedings of the 42nd annual conference on Design automation.
- [11] J. Stolfi and L.H. de Figueiredo, "An introduction to affine arithmetic," TEMA Tend. Mat. Apl. Comput., 4, Vol.3, pp. 297-312, 2003.