# A Power Modeling and Characterization Method for the CMOS Standard Cell Library Jiing-Yuan Lin, Wen-Zen Shen and Jing-Yang Jou Department of Electronics Engineering & Institute of Electronics National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C. #### Abstract In this paper, we propose power consumption models for complex gates and transmission gates, which are extended from the model of basic gates proposed in [1]. We also describe an accurate power characterization method for CMOS standard cell libraries which accounts for the effects of input slew rate, output loading, and logic state dependencies. The characterization methodology separates the power consumption of a cell into three components, e.g., capacitive feedthrough power, shortcircuit power, and dynamic power. For each component, power equation is derived from SPICE simulation results where the netlist is extracted from cell's layout. Experimental results on a set of ISCAS'85 benchmark circuits show that the power estimation based on our power modeling and characterization provides within 7% error of SPICE simulation on average while the CPU time consumed is more than two orders of magnitude less. #### 1. Introduction To develop a cell-based power estimator, power characterization is a must step. The characterization system proposed in [2] records the power consumption of a cell for all possible input events which make output node transition. However, the memory nature of internal nodes in a logic gate is ignored in the process. Therefore, it may result in less accurate estimation. The method proposed in [3] separated the power consumption behavior of a logic gate into two operation regions, "fast" and "slow" regions. The purpose of this separation is to simplify and speedup the characterization process in the "fast" region. However, the accuracy of the estimation is reduced. In this paper, we will extend the power modeling of basic gates proposed in [1] to complex gates and transmission gates. In our extended model, we will consider the power consumption behavior for each nodes in complex gates and consider all possible states for the nodes in transmission gates. In addition, we will present a more accurate characterization method which considers the effects of input slew rate, output loading, and logic state dependencies. In our methodology, the power consumption of a cell is separated into three components: capacitive feedthrough power, short-circuit power, and dynamic power. The aim of this separation is to provide not only an accurate estimation of whole circuit, but also the detail information of individual gate. These information could be very useful for power optimization. Experimental results on a set of *ISCAS'85* benchmark circuits show that the power estimation based on our methodology could provide accuracy within 7% error of SPICE results on average. # 2. Power Model of Basic Gates For a CMOS logic gate, the states of the output node and the internal nodes depend not only on the input patterns applied but also their previous states. In [1], we proposed a graph called STGPE to model this sequential behavior. Fig. 1 shows a 2-input NAND gate and its corresponding STGPE. In the graph, the state bits from the MSB (most significant bit) to LSB (least significant bit) represent the status of the nodes which are located sequentially along the path from the power supply to the ground end. In Fig. 1(b), state 01 does not exist. This is because when the output node is discharged, the internal node between output and ground is discharged simultaneously. Based on this state encoding, it is impossible to have a state with the less significant bit being one while the more significant bit being zero. Each edge $e_k$ : $(i_k, E_k, W_k)$ in *STGPE* models the power consumption of a state transition. $i_k$ is the *input pattern* applied. $E_k$ is the *edge activity number*, which denotes the number of traverse times of the edge when a set of sequential patterns are applied, and $W_k$ is the total *energy* consumed when the edge is traversed each time. The total energy consumption can be obtained by summing up the products of the edge activity number and the energy consumption of each edge. For a basic NAND or NOR gate with m inputs, we can construct the corresponding STGPE with m+1 states. However, the construction becomes more complicated for complex gates. This is because there are multiple charging and discharging paths for any internal node and thus the property of state reduction can not be applied. $$\begin{split} e_0 &: (00, E_0, W_0) \ e_1 \\ &: (01, E_1, W_1) \ e_2 \\ &: (10, E_2, W_2) \quad e_3 \\ &: (11, E_3, W_3) \\ e_4 &: (00, E_4, W_4) \ e_5 \\ &: (01, E_5, W_5) \ e_6 \\ &: (10, E_6, W_6) \quad e_7 \\ &: (11, E_7, W_7) \\ e_8 \\ &: (00, E_8, W_8) \ e_9 \\ &: (01, E_9, W_9) \ e_{10} \\ &: (10, E_{10}, W_{10}) \ e_{11} \\ &: (11, E_{11}, W_{11}) \end{split}$$ Fig. 1 The STGPE of a 2-input NAND gate. # 3. Extended Power Model # 3.1 Complex Gates In a logic gate, there may exist some internal nodes which must be passed during output charging or discharging. These nodes are referred to as the primary nodes or Type\_A nodes. A Type\_A set, TAS, is the collection of the primary nodes. In addition, power supply, ground, and output nodes are also primary nodes. Other internal nodes not belonging to the TAS are referred to as the secondary nodes or Type\_B nodes, and are collected in the Type\_B set (TBS). A path which contains secondary nodes only and is terminated at primary nodes is called a secondary path. A secondary path set (SPS) is the collection of secondary paths. Without loss of generality, we use an AOI22 gate shown in Fig. 2(a) to illustrate the 5}. Path\_3\_4\_6 represents the path starting from node 3 to passing through node 4. Therefore, node 6 and $SPS = \{ path_3_4_6, path_3_5_6 \}.$ Fig. 2 An AOI22 gate and its simplified circuit. In our model, the power estimation consists of two major steps. In the first step, a complex gate is simplified to an equivalent basic gate which retains the primary nodes only. For example, we would simplify an AOI22 gate to an equivalent 2-input NOR gate as illustrated in Fig. 2(b). Given the input signal probabilities and transition densities of the original complex gate, the equivalent input signal probabilities and transition densities of the simplified circuit can be calculated according to [4]. After obtaining the simplified circuit, we build the corresponding STGPE. The STGPE model of the simplified circuit is used only for modeling the power consumption behavior of primary nodes. The power characterization, of course, must be performed on the original circuit. In the second step, power consumption of the nodes in TBS are estimated individually. We use a graph representation similar to STGPE to model the power consumption behavior of a secondary node. Fig. 3(a) shows a sub-network of Fig. 2(a) which contains node 4, a secondary node, and the closest primary nodes, node 3 and node 6, which are called the *upper primary node* and the *lower primary node* of node 4, respectively. Fig. 3(b) shows the state transition behavior of node 4. In this example, $i_k = (S_3, A, B, S_6)$ , where A and B are the input signal values, and $S_3$ and $S_6$ are the signal values of the upper and lower primary nodes, respectively. It is worthy to note that there are only 14 edges instead of 32 edges $(2 \times 2^4)$ in Fig. 3(b). This is because some edges whose $(2 \times 2^4)$ in Fig. 3(b). This is because some edges whose input values violate the circuit behavior, e.g., inputs 0001, 0101 and 1110, are removed from the graph. In addition, node 6 is a special primary node which is tied to ground node. So, all the edges with input $S_6 = 1$ are also removed. After building the STGPE and obtaining the energy consumption of each edge, the estimation procedure is identical to that of the basic gates. Fig. 3 The STGPE of a secondary node. # 3.2 Transmission Gates Transmission gates (TGs) are widely used in logic design such as adder, multiplexer, Flip-Flop, and clocked static and dynamic logic, etc. In TG-based circuit design, the source and drain terminals of TGs are usually not tied to power supply or ground nodes directly. Therefore, the traditional two-value logic is not enough to model the behavior of TGs. Instead, we need four valid states: conducting path to ground (0), conducting path to $V_{\rm dd}$ (1), high impedance with charging $(z_1)$ , and high impedance but discharging $(z_0)$ . Fig. 4(a) shows an example of different states of a TG. Based on the four states, we model the behavior of a transmission gate as shown in Fig. 4(b). The label in each edge represents the input and the control values (A, S), and the state value stands for the output state (B). In this model, we assume that all transistors are unidirectional. Therefore, there are no 1 to $z_0$ , 0 to $z_1$ , $z_1$ to $z_0$ , and $z_0$ to $z_1$ transitions at the output node. In Fig. 4(b), the label in each edge should have both edge activity number and energy consumption like the STGPE model in previous section. Given the input signal probabilities and transition densities, the power consumption of outputs can thus be computed. Fig. 4 The transmission gate and its STGPE. Fig. 5 A generic CMOS inverter. # 4. Power Characterization # 4.1 Power Characterization of Basic and Complex Gates Consider a generic CMOS inverter circuit shown in Fig. 5 [6]. Intuitively, to characterize the power consumption of a cell, we should measure all currents caused by state transition of the cell being characterized. For example, currents $i_s$ , $i_{cd}$ , $i_{fp3}$ , and $i_{fp2}$ should be measured when characterizing the low to high transition of node n. However, it is difficult to measure those currents individually. To overcome this problem, we use separate power supplies, i.e., Vdd1 for the cell characterized and Vdd2 for the loading cells, and then measure the current flow of Vdd1. The measured current includes $i_{fp1}$ , the loading current of the preceding stage, but no $i_{fp2}$ . Although current $i_{fp2}$ does not be counted in the characterization process of INV1, it will be counted when we characterize the fan-out cells. Therefore, this characterization approach would have no power miscounted for the entire circuit. In Fig. 5, the current flowing in Vdd1 consists of three main components: (1) capacitive feedthrough current $(i_{fp1} \text{ and } i_{fp3})$ , (2) short-circuit current $(i_s)$ , and (3) the charging/discharging current of output and internal nodes $(i_{cd})$ . To characterize these components, our procedure comprises three major steps. In the first step, capacitive feedthrough current is measured. In Fig. 6, we insert an extra NMOS transistor between the output node and the drain of MP1 where the gate input is connected to ground and the size of transistor is the same as that of MN1. The purpose of this arrangement is to isolate the output node from $V_{dd1}$ such that $i_s$ and $i_{cd}$ are forced to zero. Thus, the measured current of Vdd1 becomes the capacitive feedthrough current only. In general, capacitive feedthrough current only depends on the voltage change across the two terminals of the parasitic capacitances. Thus, it can be viewed as a constant current when full swing signal is considered. As for the short circuit current, previous work [5] presented that short-circuit current is highly dependent on the input slew-rate and lightly dependent on the output loading. Ideally, if the input slew rate is infinite (zero rise/fall time), short circuit power does not exist. Therefore, to minimize the short circuit current component of a cell, we can enlarge the transistor widths of the preceding gates (2 inverters in Fig. 6) to minimize the rise/fall time. In the second step, after removing the extra transistor added in the first step and enlarging the transistor widths of the preceding gates, the current measured would contain the feedthrough current, the charging/discharging currents of output node, and a little short-circuit current component. charging/discharging current can be obtained if we ignore the little short-circuit current. In the original circuit, the current measured includes all three current components. Thus, in the third step, short-current can be obtained by subtracting the current measured in the second step from the current measured in the original circuit. As shown in Fig. 6, to consider the effect of input slew-rate, we add one adjustable capacitor to the input of the cell characterized. Similarly, another adjustable capacitor is added to the output node of the cell to examine the output loading effect. In our characterization process, we create SPICE deck, run simulation, retrieve the simulation data, and calculate the coefficients for the equations by curve fitting using *Mathematica*. Fig. 6 An inverter for power characterization. As mentioned earlier, the power estimation for primary nodes of complex gates is identical to the case of basic gates. However, in the power characterization, the transistors which are simplified to a single transistor, e.g., transistors A and B in Fig. 2(a), need to be tied together in the characterization process. Using this arrangement, the capacitive feedthrough power and short-circuit power of a complex gate as well as the dynamic power of primary nodes are contained in the characterization process of the simplified basic gate. Thus, the characterization of secondary nodes needs to consider the dynamic power only. Given the AOI22 gate shown in Fig. 2, to obtain the dynamic power of node 4, we modify it by inserting an extra NMOS transistor as shown in Fig. 7 where the transistor has ground input and has the same width of NMOS transistor A. Then, a two-pattern input sequence is applied to both the original circuit (Fig. 2) and the modified circuit. The first pattern makes node 4 discharged and the second pattern makes node 4 charged while turning off all the paths that could charge the lower primary node and the other secondary nodes. In this case, the input sequence can be (1111, 1000) or (0100,1000). The dynamic power of node 4 could be calculated as the difference of the power measurements between the original and the modified circuits. Fig. 7 Power characterization of a secondary node. #### 4.2 Power Characterization of Transmission Gates Fig. 8 shows the circuit configuration for characterizing the transmission gate TG1, where the bulks of PMOS in TG1 and TG2 are connected to two different power supplies, i.e., Vdd1 and Vdd2, respectively. TG2 is used to control the four valid states of node *in*. To measure the power for charging the output node, we could make node *in* charged first and then the node *out* charged. The difference of the charging currents is the current required to charge the node *out*. Moreover, the capacitive feedthrough current of TG1 can be obtained by measuring the current flow through the power supply Vdd1. Fig. 8 Power characterization for transmission gates. #### 4.3 Characterization Procedure The flow of our power characterization system is shown in Fig. 9. For each cell in the library, we build the corresponding STGPE and generate a stimulus file for traversing each edge in the STGPE. SPICE netlist with distinct capacitance for each interconnection layer is extracted from the cell layout using OPUS layout parasitic extractor. The transistor models used are the level 3 model of 0.8um SPDM CMOS technology provided by CIC (Chip Implementation Center in Taiwan). SR\_CL Setup adjusts the input and output capacitors to set different input slew rates and output loadings. After running SPICE simulation, the input capacitance, timing data, and power consumption are retrieved. After all edges in STGPE have been characterized, curve fitting techniques are done on the retrieved data by using *Mathematica* in our system. Finally, the characterization system would report the three energy consumption equations for each edge in the STGPE and the equations of propagation delay and rising/falling time. Fig. 9 Power characterization flow. # 5. Experimental Results A prototype power characterization system has been implemented in C on a SUN SPARCstation 20 with 64 Mbytes of memory. We performed experiments using the ISCAS'85 benchmark suits. The CBPE power estimator [1] is used for evaluating the accuracy of our characterization method. Table 1 reports the exact SPICE simulation results and the CBPE estimation results based on our cell characterization method. The signal probabilities and transition densities of the primary inputs are assigned to be 0.5 for all circuits. Based on the input characteristics, a random signal generator generates 1000 patterns with 10ns clock cycles time for both SPICE and VERILOG simulators. The cell library used involves the basic gates and some AOI and OAI complex gates. The column labeled "Power CBPE" denotes the power consumption estimated by CBPE. In the experimental results, in fact, the VERILOG simulation consumes over 95% of the CPU time in CBPE. Due to the lack of enough memory space and the tremendous CPU time consumed, C1980, C3540, and C6288 can not be finished by SPICE simulation. In summary, the experimental results show that the power estimation based on our power modeling and characterization provides within 7% error of SPICE estimation on average while the CPU time spent is more than two orders of magnitude less. Table 2 shows the experimental results of transmission gate based circuits. We test our model and power characterization on an 8-bit ripple carry adder and a 4-to-1 multiplexer. The estimation result of multiplexer is not that good. The possible reasons are the unidirectional assumption and the neglection of the correlation of control signals. Some internal nodes in multiplexer may be charged or discharged in the opposite directions as we assume for the corresponding TG. During these charging and discharging, our CBPE always reports the output node to be in high impedance state. For example, our system may report the state value of a node as $(0z_01)$ instead of (011), the real value, from time t-2 to t. The $z_0$ to 1 transition is recognized as a complete charging in our system; however, in the real situation, no transition occurs. So, we would get a possible overestimation under the unidirectional assumption of transmission gate. ## 6. Conclusion and Future Works In this paper, we propose power consumption models for the complex gates and the transmission gates. In addition, we also present an accurate power characterization method for these gates. In our approach, we divide the power consumption of a cell into three components: capacitive feedthrough power, short-circuit power, and dynamic power. Our estimation system reports the power consumption components for each gate. These reported information will be very useful for a short-circuit power driven or dynamic power driven optimization. The accuracy of cell-based power analysis depends not only on the accuracy of power modeling and characterization but also the accuracy of the switching activity estimation. In simulation-based approach, the accuracy of switching activity estimation is strongly dependent on what input patterns and how many input patterns applied. In the future, this problem needs considerable research efforts. In addition, for the transmission gate intensive designs, the model and characterization method are not accurate enough. #### Reference - Jiing-Yuan Lin, Tai-Chien Liu, and Wen-Zen Shen, "A Cell-Based Power Estimation in CMOS Combinational Circuits," in Proc. IEEE/ACM International Conference on Computer-Aided Design, pp. 304-309, Nov. 1994. - [2] B. George, G. Yeap, M. Wloka, S. Tyler, D. Gossain, "Power Analysis for Semi-custom Design," in Proc. Custom Integrated Circuits Conference, pp. 249-252, 1994. - [3] H. K. Sarin and A. J. McNelly, "A Power Modeling and Characterization Method for Logic Simulation," in Proc. Custom Integrated Circuits Conference, pp. 363-366, 1995. - [4] S. C. Prasad and K. Roy, "Circuit Optimization for Minimization of Power Consumption under Delay Constraint," in Proc. 1994 International Workshop on Low Power Design, pp. 15-20, 1994. - [5] H. J. M. Veendric, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits," IEEE Journal of Solid-State Circuits, Vol. SC-19, No. 4, pp. 468-473, Aug. 1984. - [6] F. N. Najm, R. Burch, P. Yang, and I. N. Hajj, "Probabilistic Simulation for Reliability Analysis of CMOS VLSI Circuits," IEEE Trans. on Computer-Aided Design, Vol. 9, No. 4, April 1990. Table 1 Estimation results for a subset of ISCAS'85 benchmark circuits. | Ex. | Inputs | Outs. | Trs. | Gates | Power(uw) | | CPU time (sec.) | | Error | | |---------------|--------|-------|-------|-------|-----------|---------|-----------------|--------|--------|--| | | | | | | SPICE | CBPE | SPICE | CBPE | | | | C17 | 5 | 2 | 24 | 6 | 309.1 | 285.19 | 204 | 2.1 | 7.74% | | | cm138a | 6 | 8 | 92 | 26 | 782.1 | 807.98 | 806 | 4.9 | 3.30% | | | cm150a | 21 | 1 | 244 | 61 | 4427.7 | 4032.1 | 4563 | 28.3 | 8.94% | | | cm151a | 12 | 2 | 106 | 24 | 1689.9 | 1631.3 | 1444 | 11.3 | 3.47% | | | cm152a | 11 | 1 | 80 | 19 | 910.4 | 863.8 | 832 | 5.5 | 5.12% | | | cm162a | 14 | 5 | 204 | 56 | 2637.4 | 2288.7 | 3252 | 18.9 | 13.22% | | | cm163a | 16 | 5 | 196 | 54 | 2539.7 | 2470.6 | 3120 | 23.3 | 2.72% | | | cm42a | 4 | 10 | 114 | 35 | 1398.1 | 1364.6 | 1148 | 9.5 | 2.39% | | | cm82a | 5 | 3 | 82 | 21 | 1412.6 | 1300.3 | 859 | 8.4 | 7.95% | | | cm85a | 11 | 3 | 172 | 46 | 2401.5 | 2312.2 | 2413 | 17.8 | 3.72% | | | cmb | 16 | 4 | 192 | 49 | 2080.1 | 1995.3 | 2860 | 17.2 | 4.07% | | | f51m | 8 | 8 | 548 | 135 | 7650.3 | 7382.8 | 12843 | 59.1 | 3.49% | | | C432 | 36 | 7 | 900 | 238 | 15117 | 14808.5 | 32779 | 119.2 | 2.04% | | | C499 | 41 | 32 | 1704 | 438 | 34345 | 28021.1 | 81461 | 282.2 | 18.40% | | | C880 | 60 | 26 | 1316 | 326 | 21949 | 18725.5 | 55246 | 162.7 | 14.68% | | | C1908 | 33 | 25 | 2282 | 586 | * | 35686 | * | 374.5 | | | | C3540 | 50 | 22 | 4568 | 1074 | * | 61045 | * | 886.1 | | | | C6288 | 32 | 32 | 10136 | 2716 | * | 193069 | * | 6077.9 | | | | Average Error | | | | | | | | | | | <sup>\*:</sup> can not be finished. Table 2 Estimation results for TG-based circuits. | Ex. | Trs. | Power(uw) | | CPU time (sec.) | | Error | |---------------------------|------|-----------|--------|-----------------|--------|--------| | | | SPICE | CBPE | SPICE | CBPE | | | 8-bits ripple carry adder | 208 | 641.5 | 575.89 | 5389 | 113.85 | 10.22% | | 4-to-1 multiplexer | 36 | 61.287 | 78.347 | 703 | 5.12 | 27.83% |