# An Efficient IP-Level Power Model for Complex Digital Circuits

<sup>\*</sup>Chih-Yang Hsu, <sup>A</sup>Chien-Nan Jimmy Liu and <sup>\*</sup>Jing-Yang Jou

\* Dept. of Electronics Engineering National Chiao Tung University Hsinchu, Taiwan

Abstract

In this paper, we propose an efficient IP-Level power model with a small lookup table for complex CMOS circuits. The table has only one dimension that maps the zero-delay charging and discharging capacitance into the real power consumption of pattern pairs but still has high accuracy. In order to improve the efficiency of the characterization process, the Monte Carlo approach is used during the estimation of the average power to skip the samples that will not increase the accuracy too much. The experimental result shows the table sizes are only up to 107 entries for ISCAS'85 benchmark circuits and the estimation error is only 2.99% on average using the lookup table.

### I. Introduction

System-on-a-chip (SOC) is a trend of system integration in recent years. For SOC designs, most design teams integrate many well-designed circuit blocks called intelligent properties (IPs) and some self-designed circuit blocks to build up the complex system in a short time. While designing such complex systems, low power is also an important consideration because of the increasing requirement of portable devices. Traditionally, power estimation is often performed at transistor-level by SPICE-liked simulation. However, this approach is unpractical for SOC designs because it needs very high computing power.

For this application, power models may provide an efficient solution to estimation the power consumption of IPs because the transistor-level simulation is only required at the characterization step. The power model of a design describes the relationship between power characteristics and real power consumption with specific input sequences or input signal statistics. Lookup tables are the most commonly used power models. Because the power dissipation of a combinational circuit depends on the previous and present input patterns, a fully characterized lookup table for an *n*-input combinational circuit will have  $2^{2n}$  entries. It is obviously unfeasible for complex circuits because the table size is too large to be stored and the characterization process will consume too much time. Efficient reduction methods are definitely required to make this approach become feasible.

In this approach, the chosen power characteristics have large impacts on the table size and the accuracy of the estimated power consumption. Therefore, many power characteristics are proposed in the literature  $_{[1~4]}$  such as the signal statistics of the primary inputs and outputs, the active information of a design, the power sensitivity of primary inputs, the Hamming distance of the pattern pairs at the primary inputs, etc. The methods proposed in  $_{[1][2]}$  are building the lookup tables according to the signal transitions at the primary inputs. In  $_{[1]}$ , the authors use a clustering algorithm to compress the input vectors with approximate <sup>4</sup> Dept. of Electrical Engineering National Central University Taoyuan, Taiwan

power consumption as a cluster such that the table size can be reduced. The method in  $_{[2]}$  operates on the state transition graphs (STGs) of macrocells with merging transition compatible nodes to reduce the sizes of lookup tables.

The methods in [3] use the signal statistics of the primary inputs and outputs to be the indexes of the lookup tables. In [3], the lookup tables with 2 dimensions (average input signal probability, average input signal transition density), 3 dimensions (average output zero-delay transition density as the third dimension) and 4 dimensions (average spatial correlation coefficient as the fourth dimension) are compared. The results show that the estimation errors are decreased when the dimensions of tables are increased, but the sizes of tables are increased. The increase of table size will require extra characterization time that may become a non-neglectable overhead. However, because the distribution of the average output transition density is hard to control, the characterization time to fill the lookup tables is hard to control.

Based on the above observations, we can realize that the size of the lookup table is a primary concern for the power models of complex designs such as commercial IPs. Therefore, we propose a table-based power modeling method in this paper for combinational circuits in which the table size is very small and almost independent to the number of primary inputs. In order to reduce the table size, we build a one-dimension lookup table to map the zero-delay charging and discharging capacitance (CDC) to the real power consumption of input pattern pairs. In order to simplify the description, we will use CDC to represents the zero-delay charging and discharging capacitance in the rest of this paper. The CDC of a pattern pair is the summation of charging and discharging capacitances of the nodes whose signals change from 0 to 1 or 1 to 0 during the transition of input patterns under zero-delay model. Using CDC as the index of the lookup tables is decided by our previous comparison results of the average normalized error between the three power characteristics, CDC, zero-delay switching count (SC) of internal nodes and Hamming distance (HD) of input pattern pairs [4]. Among those power characteristics, the CDC has the minimal average normalized error.

In previous introductions, we can realize that the efforts for the characterization process are also important issues. Therefore, we modify the grouping algorithms in [4] to have a more efficient characterization process while building the lookup table. In [4], the CDC distribution of input sequence is deterministic. However, the CDC distribution in this work is non-deterministic until we simulate all input pattern pairs. In order to handle this situation without simulating all possible cases, we dynamically increase the entries of the lookup tables to cover the current CDC distribution of the designs when we characterize the average power for each entry in the table. Because a lot of pattern pairs may appear in the same group, we use the Monte Carlo simulation <sup>[5]</sup> to further reduce the characterization time.

The rest of this paper is organized as follows. In Section II, we will describe the power model proposed in this work. In Section III, the dynamic grouping algorithm will be described. The power characterization process will be shown in Section IV. The estimation of average power with the proposed power model is described in Section V. The accuracy of our power model will be evaluated in Section VI through several experiments and some conclusions will be given at the end.

## **II Power Modeling**

The power consumption of a digital circuit is formulated as Equation (1). The static power  $(P_{static})$  is the power consumption of the leakage currents in the reversed P-N junctions, which is often much smaller than the dynamic power ( $P_{dynamic}$ ). The  $P_{dynamic}$  is the summation of the power of functional transition ( $P_{func\_trans}$ ), the power of glitch ( $P_{glitch}$ ) and the short-circuit power  $(P_{short-circuit})$  represented as Equation (2). The  $P_{short-circuit}$  is consumed when short-circuit current flows from  $V_{DD}$  to ground at the period that both PMOS and NMOS transistors turn on together during the signal transitions and is often smaller than the summation of the  $P_{func\_trans}$  and the  $P_{glitch}$ . The proportion between  $P_{func\_trans}$ and  $P_{glitch}$  is depending on the circuit behavior and the design skill. Given a circuit with n nodes, we could express the power consumptions of  $P_{func\_trans}$  and  $P_{glitch}$  as Equation (3) and (4), where i denotes the node-index,  $C_i$  is its load capacitance of node i,  $V_{dd}$  is supply voltage of the circuit, the  $f_{i_{func}}$  is the frequency of functional transition at node *i* and the  $f_{i\_glitch}$  is the frequency of glitch at node *i*.  $\tau_i$  is the factory of the width of glitch to the glitch power and should be between 1 and 0.

$$P = P_{static} + P_{dynamic} \tag{1}$$

$$P_{dynamic} = P_{func\_trans} + P_{glitch} + P_{short-circuit}$$
(2)

$$P_{func\_trans} = \frac{1}{2} \cdot V_{dd}^2 \cdot \sum_{i=1}^{n} C_i \cdot f_{i\_func}$$
(3)

$$P_{glitch} = V_{dd}^2 \sum_{i=1}^n C_i \cdot f_{i_glitch} \cdot \tau_i$$
(4)

In this work, the lookup table maps a CDC interval to a real power value implies that the lookup table uses the  $P_{func\_trans}$  to indicate the trend of the real power consumption on average. The CDC value of a pattern pair is the  $\sum_{i=1}^{n} C_{i} : f_{i}$  (in Equation (3) where  $f_{i}$  (in the part of the part o

 $\sum_{i=1}^{i} C_i \cdot f_{i_{junc}} \text{ in Equation (3) where } f_{i_{junc}} = 1 \text{ if node } i \text{ has}$ 

The power model is represented by a lookup table, which maps the CDC values to real power consumption of input pattern pairs. The building flow of the lookup table is shown in Fig. 1. We will first divide the input pattern pairs into several groups according to their CDC values that are calculated by a logic-level simulator. Those pattern pairs within an interval of CDC values will be grouped together and the average power of them, which is estimated by PowerMill, will be recorded in the corresponding entry of the lookup table.



Fig. 1. The block diagram for building the power model

#### **III. Dynamic Grouping**

Using the CDC values of pattern pairs to be the index of lookup table may still have huge table size if we set a table entry for each different CDC value. Although the table size will be much smaller than  $2^{n+n}$ , where *n* is the number of primary inputs of the circuit, it is still very huge. Therefore, a table size reduction method is definitely required.

In order to reduce the table size, we can collect those pattern pairs with similar CDC values to be a group and only set one entry in the lookup table for each group. A similar grouping method was used in pattern compaction techniques for power estimation [4]. In [4], however, the compacted sequence is generated for a specific input sequence. In other words, the input sequence is deterministic and the distribution of CDC values is deterministic, too. Unfortunately, when we build the lookup table for the proposed power model in our work, the CDC distribution is non-deterministic until we simulate all pattern pairs, which is almost impossible for large circuits even using a logic-level simulator.

In order to handle this situation without simulating all possible cases, we propose a method to dynamically increase the entries of the lookup tables to cover the current CDC distribution of the designs when we characterize the average power for each entry in the table. As illustrated in Fig. 2, the CDC values of pattern-pairs have been sorted before grouping. The X-coordinate is the number of pattern pairs and the Y-coordinate is the CDC value of each pattern pair. In the first iteration, we randomly generate several pattern pairs and the dynamic grouping works like the grouping process in [4] as shown in Fig. 2(a). Each group is defined with an interval of CDC values and the neighborhood groups have continuous CDC values. In the second iteration, we generate more random patterns and the number of group is spread because the CDC distribution area is increased as shown in Fig. 2(b). The ranges of the groups in Fig. 2(a) are not changed but new groups are generated from the boundary of the first and last groups in Fig. 2(a). The size of the lookup table in our power model is determined by the number of groups in the dynamic grouping process, which can be controlled by the user-defined group interval. This group interval is defined by a percentage of the range from the maximum CDC value to the minimum CDC value of each group and set as 5% of the maximum CDC value in this work. If the interval is smaller than the minimum load capacitance of the nodes, the interval will be set as the minimum load capacitance of nodes because it is impossible to have such CDC values.



Fig. 2(a). The dynamic grouping process after the first iteration



Fig. 2(b). The dynamic grouping process after the second iteration

### **IV. Power Characterization**

In our power model, the corresponding power for each table entry is determined by the average power consumption of all pattern pairs located in the corresponding CDC interval. Therefore, we use a random input generator to generate a number of pattern pairs such that they can distribute over different groups. Fig. 3 gives an illustration of this power characterization flow. The power characterization process will stop under 2 conditions as follows:

(a). The average power consumption of each group has reached the desired confidence level.

(b). The total pattern pairs have reached the constraint of maximum number of pattern pairs.

The maximum number of characterized pattern pairs is used to control the characterization efforts. It can be decided by users to make a trade-off between accuracy and characterization efforts. If criterion (b) is used to stop the characterization process, the average power of those groups that do not have enough pattern pairs will be estimated with interpolation or extrapolation because the current samples may not have enough representatives.

In order to further improve the efficiency of the characterization process, we use the Monte Carlo approach [5] to check the stop criteria (a) such that we can finish the characterization process as soon as possible. Under the assumption that the mean of any sample is normal distribution, the end of simulation can be decided according to the statistical stopping criterion as Equation (5). In Equation (5),  $\varepsilon$  is the user acceptable maximum percentage

error, N is the number of sample,  $\eta_{\rm T}$  is the sample mean and  $s_T$  is the sample standard deviation. For (1- $\alpha$ ) confidence level,  $t_{\alpha/2}$  is t-distribution coefficient with (N-1) degrees of freedom.





Pave

CDC<sub>L2</sub>:CDC<sub>H2</sub>

CDCL3:CDCH3

CDC<sub>Lg</sub>:CDC<sub>Hg</sub>

After the estimation of average power has converged according to the Monte Carlo stop criteria, we will not simulate the following pattern pairs for these groups in the transistor-level simulator because the current results already have the desired accuracy.

# V. Power Estimation with the Power Model

After the power model of a circuit is built, the average power consumption for any test sequence can be estimated. First, we use a logic-level simulator to calculate the CDC values of pattern pairs in the test sequence. With the CDC values, we can find their corresponding groups in the lookup table for those pattern pairs in the sequence. If a pattern pair belongs to a CDC interval, its power consumption will be set as the value of the corresponding table entry in the lookup table, and the total power is equal to the summation of total values of every pattern pairs. Finally, the average power can be obtained from dividing the total power by the number of pattern pairs.

The lookup table may not cover the whole CDC distribution of all possible pattern pairs because we did not simulate all pattern pairs in the characterization process. In this case, we can use extrapolation to estimate the power consumption of those pattern pairs that belong to the non-sampled groups. The average power consumption can be expressed as Equation (6). In Equation (6), N is the total number of pattern pairs in the test sequence. g is the number of entries of the lookup table.  $P_i$  is the average power recorded in the  $i^{th}$  entry of the lookup table.  $n_i$  is the number of pattern pairs in the test sequence whose CDC values are involved in the CDC interval of  $i^{th}$  entry.  $P_{out_of_range}$  is the total power consumption of those pattern pairs whose CDC values are out of the range of the lookup table. As shown in Equation (7), we can use extrapolation to estimate the power consumption of those pattern pairs.  $P_1$ ,  $P_2$ ,  $P_{g-1}$  and  $P_g$  are defined as  $P_i$  in Equation (6).  $CDC_1$ ,  $CDC_2$ ,  $CDC_{g-1}$  and  $CDC_g$  are the largest CDC values of entries 1, 2, g-1 and g in the lookup table.  $k_r$  and  $k_l$  are the numbers of pattern pairs which are out of the smallest and largest range of the CDC range in the lookup table.

$$P_{avg} = \frac{\sum_{i=1}^{\infty} P_i \times n_i + P_{out\_of\_range}}{N}$$
(6)

$$P_{out_{-}of_{-}range} = \sum_{i=1}^{k_{r}} \left[ P_{1} - \left( \frac{P_{2} - P_{1}}{CDC_{2} - CDC_{1}} \right) \times (CDC_{1} - CDC_{i}) \right]$$

$$+ \sum_{i=1}^{k_{i}} \left[ P_{g} + \left( \frac{P_{g} - P_{g-1}}{CDC_{g} - CDC_{g-1}} \right) \times (CDC_{i} - CDC_{g}) \right]$$

$$(7)$$

# **VI. Experimental Results**

The experiments are obtained on a SUN UltraSPARC II workstation. The test circuits are ISCAS'85 benchmark circuits. In characterization process, the random input sequence generator generates a sequence with 5,000 pattern pairs in every iteration and the maximum number of characterized pattern pairs is set as 100,000. The confidence level in the Monte Carlo criteria is set as 0.99 ( $\alpha$  is set as 0.01) under the maximum acceptable error  $\varepsilon$  is set as 0.05. The sample size is set as 30.

The experimental results are demonstrated in Table 1. The table sizes for the benchmark circuits are listed in the  $2^{nd}$  row under the names of circuits in the  $1^{st}$  row. According to the results, table sizes are only 42 to 107 for those circuits. It is very small and is almost independent to the circuit size. In order to show that our approach can be applied to various input sequences, we test the accuracy of our method by estimating the average power consumption of circuits with 3 different sequences. The test sequences are pseudo random sequence, counter sequence and LFSR sequence with 50,000 pattern pairs respectively for those circuits. The overall average error is 2.99%. The experimental results show that our power model still has high accuracy for different input sequences.

Table 1. The experimental results

|                     |           | Circuits   | C432   | C499    | C880    | C1355   | C1908   | C2670   | C3540   | C5315   | C6288    | C7552   |
|---------------------|-----------|------------|--------|---------|---------|---------|---------|---------|---------|---------|----------|---------|
|                     |           | Table Size | 63     | 45      | 75      | 51      | 42      | 90      | 65      | 103     | 107      | 103     |
| Random<br>Sequence  | PowerMill | I (uA)     | 56.135 | 149.718 | 106.480 | 161.386 | 144.083 | 261.945 | 340.913 | 611.173 | 4841.000 | 830.832 |
|                     |           | Time (Sec) | 3072   | 8626    | 5826    | 9494    | 7649    | 14667   | 19017   | 33918   | 290751   | 43169   |
|                     | 1-D Table | I (uA)     | 57.083 | 148.698 | 103.818 | 158.006 | 141.921 | 253.659 | 327.412 | 598.521 | 4722.450 | 816.488 |
|                     |           | Time (Sec) | 23.4   | 70.3    | 46.2    | 65.9    | 54.5    | 112.5   | 114.9   | 205.4   | 388.5    | 267.3   |
|                     | Error (%) |            | 1.69   | 0.68    | 2.50    | 2.09    | 1.50    | 3.16    | 3.96    | 2.07    | 2.45     | 1.73    |
| Counter<br>Sequence | PowerMill | I (uA)     | 13.243 | 34.493  | 38.119  | 38.270  | 50.477  | 6.402   | 191.720 | 20.911  | 351.290  | 68.626  |
|                     |           | Time (Sec) | 690    | 1958    | 1773    | 2163    | 2417    | 479     | 9347    | 1275    | 19385    | 3631    |
|                     | 1-D Table | I (uA)     | 13.971 | 33.468  | 37.153  | 37.456  | 53.483  | 5.923   | 192.283 | 20.703  | 355.533  | 70.526  |
|                     |           | Time (Sec) | 19.7   | 61.9    | 39.4    | 57.2    | 48.6    | 84.9    | 105.1   | 159.3   | 257.6    | 210.5   |
|                     | Error (%) |            | 5.50   | 2.97    | 2.53    | 2.13    | 5.96    | 7.48    | 0.29    | 1.00    | 1.21     | 2.77    |
| LFSR<br>Sequence    | PowerMill | I (uA)     | 71.436 | 167.103 | 118.619 | 182.053 | 169.728 | 286.910 | 374.463 | 669.812 | 5014.680 | 975.568 |
|                     |           | Time (Sec) | 4050   | 9496    | 6786    | 11007   | 9460    | 16789   | 21914   | 38744   | 309071   | 52882   |
|                     | 1-D Table | I (uA)     | 66.274 | 161.111 | 114.371 | 178.741 | 162.510 | 282.068 | 361.522 | 652.722 | 4820.264 | 936.464 |
|                     |           | Time (Sec) | 24.4   | 71.8    | 47.5    | 68.4    | 56.4    | 117.1   | 117.3   | 210.9   | 398.3    | 279.3   |
|                     | Error (%) |            | 7.23   | 3.59    | 3.58    | 1.82    | 4.25    | 1.69    | 3.46    | 2.55    | 3.88     | 4.01    |
| Average Error (%)   |           |            | 4.80   | 2.41    | 2.87    | 2.01    | 3.90    | 4.11    | 2.57    | 1.87    | 2.51     | 2.83    |

# **VII Conclusion**

In this paper, we proposed an efficient IP-Level power model with a small lookup table for complex CMOS circuits. The lookup table has only one-dimension that maps the zero-delay charging and discharging capacitance (CDC) to the real power consumption of input pattern pairs but still has high accuracy. In order to reduce the table size, we collect those pattern pairs with similar CDC values to be a group and only set an entry in the lookup table for each group. The dynamic grouping process will automatically increase the entries of the lookup tables to cover the current CDC distribution of the designs during the power characterization process. In order to improve the efficiency of the characterization process, the Monte Carlo approach is used during the estimation of the average power to skip the samples that will not increase the accuracy too much. The experimental results show that our power model can estimate the average power of IP-level complex designs very efficiently and accurately for various test sequences.

## Acknowledgements

This work was supported in part by the National Science Council under Contract NSC 90-2215-E-009-049.

#### Reference

- Huzefa Mehta, Robert Michael Owens and Mary Jane Irwin. "Energy Characterization based on Clustering," *Proceeding of 33<sup>rd</sup> Design Automation Conference*, pp. 702-707, 1996.
- [2]. Jiing-Yuan Lin, Wen-Zen Shen, and Jing-Yang Jou. "A Structure-Oriented Power Modeling Technique for Macrocells," *IEEE Transactions on VLSI Systems*, pp. 380-391, Sep. 1999.
- [3]. Subodh Gupta and Farid N. Najm. "Power Modeling for High-Level Power Estimation," *IEEE Transactions* on VLSI Systems, pp. 18-29, Feb. 2000.
- [4]. Chih-Yang Hsu and Wen-Zen Shen. "Vector Compaction for Power Estimation with Grouping and Consecutive Sampling Techniques," *Proceeding of International Symposium on Circuits and Systems*, vol. II, pp. 472-475, 2002.
- [5]. I. R. Miller, J. E. Freund and R. Johnson, *Probability* and *Statistics for Engineers*. Englewood Cliffs, NJ: Prentice-Hall, 1990.