# Channel-Driven Global Routing with Consistent Placement \* (extended abstract for ICCAD '94) Shigetoshi NAKATAKE †and Yoji KAJITANI †‡ <sup>†</sup>School of Information Science, Japan Advanced Institute of Science and Technology Tatsunokuchi, Ishikawa 923-12, Japan <sup>‡</sup>Department of Electrical and Electronic Engineering, Tokyo Institute of Technology Meguro-ku, Tokyo 142, Japan #### abstract A global router with its consistent placer is proposed which aims to control wire-densities of channels. The routing order of nets and their routes are decided according to the channel which is predicted to have the maximum wire-density. The placer distributes the nets evenly with respect to the virtual length (half perimeter of the bounding box). Interesting features included are the interactive dynamic test to decide the form of predicting functions and the admissible region to consider the routing resources in placement stage. Experiments reveal some interesting phenomena that smaller maximum wire-density is attained in spite of comparable total wire-density and that smaller maximum wire-length in spite of larger total wire-length. #### 1 Introduction The purpose of the global router is usually to bound the wire-density of channels, especially in layout design of FPGAs [5],[6]. Global routers proposed so far follow basically the strategy: after independently optimized placement, nets are given priority for routing order randomly or according to some criticality. Nets are routed sequentially considering the current and/or predictive wiredensity of channels. If the result is not satisfactory with respect to 100% routing or length of nets, some of preexisting nets are ripped up and rerouted, changing the priorities. Therefore the quality of results heavily depends on the order of nets to be processed. So far the effect has been treated the one that must be avoided and efforts have been devoted to develop algorithms insensitive to the order, for example in [3],[4], though their problem is different from ours. \*This work is supported in part by CAD21 at JAIST and a Grant in Aid for Scientific Research of the Ministry of Education, Science, and Culture of Japan(B05452209). However a successful global router should be so designed, in contrast to conventional way, to focus directly to the channels to control the densities, and importantly to make effective use of net ordering. We call such a global router channel-driven. The objective of this paper is to show a direction for developing a channel-driven global router (Section 2) together with a placer (Section 3). It is described taking FPGAs as examples. The idea of our algorithm is sketched as follows: First, for each channel, estimate the number of nets whose shortest path would pass there. Second, take a channel whose estimation is maximum and list the nets that attribute this maximum. Finally, route them avoiding those channels and renew the estimation to continue. The formal definition of the estimation above is one of key issues. For the purpose, the *interactive dynamic test* is introduced to evaluate a potential performance of the router. It is reasoned that the resultant wire-densities in our global router depends heavily on the size of the maximum bounding box of all nets. To develop a placer to match this feature, a regional placement-energy is employed. We compare the performance of our global router and placer combined system on randomly generated problems of Xilix FPGA size with the conventional min-cut based or min-total-length based placers followed by a sequential router. We had interesting phenomena by experiments: smaller maximum wiredensity is attained in spite of comparable average wiredensity, and smaller maximum wire-length in spite of larger total wire-length (Section 5). ## 2 The Channel-Driven Global Router Our channel-driven global router is directed by the prediction of the final wire-density which, as the primal objective, we want to bound. However the predic- Fig.1: Channel-Driven Routing Controlled by Congestive Channels on Sliced $\mathcal C$ tion itself is affected by the router. We construct the router based on the assumption that each route takes the shortest. By this assumption, the prediction can be made solely by the placement and routing history. On the other hand, if a net were forced to detour, the prediction would be unreliable and its quality damaged. Our idea is to let the placer be responsible to manage it. Therefore our router is completely characterized by the way how to predict the wire-density and how to use the prediction. The former is fixed by applying the *interactive dynamic test* to the router. ## 2.1 Predicting Wire-Density A net n is the list of terminals to be connected. The set of all nets is denoted by $\mathcal{N}$ . While let $\mathcal{C}$ denote the set of all channels. The routine of a global router is to determine the route of connection of each net in terms of a sequence of channels. We assume for simplicity to describe the idea that each net is two-terminal. For net $n \in \mathcal{N}$ and channel $c \in \mathcal{C}$ , $\alpha(c,n)$ is a positive real function between 0 and 1, called the *attribute*. Let $\mathcal{N}(c)$ be the set of nets whose shortest routes could pass c. The *predictive wire-density* is defined as: $$\Gamma(c) = \sum_{n \in \mathcal{N}(c)} \alpha(c, n)$$ ## 2.2 The Strategy Our strategy of net ordering and net routing is described as follows: ROUTER: channel-driven & predictive step 1 (priority of channels) Sort $\{\Gamma(c), c \in \mathcal{C}\}$ in non-increasing order. Let the result be $(\gamma_1, \gamma_2, \cdots, \gamma_{|\mathcal{C}|})$ . step 2 (partition of channels) Define subset $\mathcal{C}_i$ of $\mathcal{C}$ such that $$C_i = \{c \,|\, \Gamma(c) \geq \gamma_i\}.$$ A channel in $C_i$ is called the ith congestive channel step 3 (partition of nets) Let $\mathcal{N}(C_i)$ be the union of $\mathcal{N}(c)$ over all $c \in C_i$ . Define $$\Delta N_i = \{n \mid n \in \mathcal{N}(C_i) - \mathcal{N}(C_{i-1})\}\$$ where $C_0 = \phi$ . step 4 (priority of nets) While there is any unrouted net, choose one net n according to the priority $(\Delta N_1, \Delta N_2, \cdots, \Delta N_{|C|})$ . Route it by a shortest path using channels with priority according to $(C_{|C|}, C_{|C|-1}, \cdots, C_1)$ . Update $\alpha(c, n)$ as follows: $$\alpha(c,n) = \begin{cases} 1 & \text{if } n \text{ is assigned to pass } c \\ 0 & \text{otherwise} \end{cases}$$ END. Fig.1 is an image explaining the idea. Whole the routing resource is sliced and the first congestive channel selects a net. The net will be assigned a route avoiding the channel within shortest routes. The performance of this global router is observed that if there is no limitation in capacity of channels, a net in $\Delta \mathcal{N}_i$ will take the route in $C_i - C_{i-1}$ reducing the predictive wire-density of some of $C_1, \dots, C_{i-1}$ . Since the route takes currently sparsest area, it has an effect to raise the low prediction and reduce high prediction. In other words, we can have an effect of flattening the next predictive wire-density. Correctness of this reasoning will be checked by experiments. # 2.3 Attributes of Nets Our global router is complete by defining the attributes. It should be defined by the performance behavior. However reasonable candidates will be the following two. - 1. $\alpha_{posmax}$ : $\alpha(c,n) = 1$ for any channel c and net n. $\Gamma(c)$ represents the maximum wire-density at c worst possibly attained, and called the possible maximum wire-density. - 2. $\alpha_{espect}$ : $\alpha(c,n) = h'/h$ where h the number of possible shortest routes of net n and h' is that of possible shortest routes of net n that pass channel c. Note that $h' \leq h$ . $\Gamma(c)$ represents the expectant wire-density at c, and is called the expectant wire-density. Note that h'/h has a simple formula in terms of width and height of the bounding box. For the performance test, it is conventional to apply the router to numerous number of randomly generated problems or a few benchmarks. However, it is not reliable because: Problems, especially public benchmarks, are those that have been fixed for long and algorithms could be designed targeting them. Fig.2: Grid Plane Model 2. Analysis of experimental results is difficult since individual problems are so isolated not to reveal continuous parametric features of the algorithm. We present a new way of experiments, called the interactive dynamic test, which is expected free from the above defects. Let A be ROUTER: channel-driven & predictive where the attribute is $\alpha_{posmas}$ or $\alpha_{espect}$ . Let $P_0$ be a seed problem. Applying A to $P_0$ , suppose we have a feasible solution $S_0$ . Then, problem $P_1$ is generated by the incremental problem generator which works as follows. If there is a pair of terminals $(t_1, t_2) (\notin \mathcal{N})$ between which a shortest route is possible without increasing the maximum wire-density of $S_0$ , create a virtual net $\{t_1, t_2\}$ and route it taking a shortest route. This is called an incremental routing. Continue random incremental routings as possible. Make the net list of the result and relax the routings to obtain a new problem $P_1$ . Apply A to $P_1$ to obtain solution $S_1$ . Generate problem $P_2$ from $S_1$ by incremental problem generator again as above, and continue. Unless the algorithm under test is ideal, the size (number of nets) of solutions $S_0, S_1, \cdots$ will increase. Always we know the upper bound of wire-density which might be sure the maximum by construction. We would say that the algorithm is better if the increase is smaller (asymptotically). Let $A_{posmax}$ and $A_{expect}$ be ROUTER: channel-driven & predictive with $\alpha_{posmax}$ and $\alpha_{expect}$ , respectively. We applied the interactive dynamic test to them using $15 \times 15$ block array model as shown in Fig 2 imaging symmetrical FPGAs where the termi- Fig.3: Interactive Dynamic Test for $\alpha_{expect}$ and $\alpha_{posmax}$ on 350 seeds nals are assumed to locate in the center of blocks. We started with 350 different seed problems (placements) for $P_0$ , They are known to be realized with maximum wire-density 5. The result is shown in Fig.3 where the x-coordinate denotes the number of test cycles and y-coordinate the maximum wire-density of the solutions. Every sequence of $A_{expect}$ overwhelms all sequences of $A_{posmax}$ . Since small increase is better, $\alpha_{expect}$ is adopted. #### 3 The Flat-Distribution Placer As mentioned, the algorithm ROUTER: channel-driven & predictive looses control on bounding wire-density when many nets are forced to have few choices other than to pass certain restricted channels. To avoid for nets to fall in such an ill state, our router requires the terminals of each net be distributed in length (half perimeter of bounding box of the net) as evenly as possible. This conclusion is not only interesting but a little bit strange in contrast to the usual requirement to the placer: the closer the terminals of every net, the better. Furthermore, if the placer provides a placement in which the length of the nets are flat in high value, it would not be expected to lead a good result. However, since the sum of length is closely related to the sum of wire-density, in attaining our target which is to reduce the maximum wire-density, to minimize the sum of length is an underlying request. Thus, a placer shall be so developed as to minimize the deviation of lengths of terminals and sum of lengths. The problem is formulated as: Given a graph, assign the vertices onto the set of blocks on the grid plane as shown in Fig 2. (A vertex represents a block. Terminals of blocks are assumed to lie in the center for the simplicity.) The placer is sketched as follows. #### PLACER: flat-distribution We start with an arbitrary initial placement. A pair interchange of blocks is a unit operation to renew the current placement. The definition of the cost of an operation characterizes the algorithm. Any heuristics is available but our experiment applied the simulated annealing. Definition of Placement-Energy: For each block b in the current placement, define the admissible region R(b). Define the regional placement-energy $\Pi(b, R(b))$ which is the sum of the "values" over all blocks $b_x$ that are connected by a net. The "value" of $b_x$ is a function which is defined by the distance from b and R(b). It is very slowly increasing with the distance from b if $b_x$ is inside the admissible region R(b), and it is steeply increasing with the minimum distance to the boundary of the admissible region if $b_x$ is outside. The value is called the penalty. (An image is shown in Fig.4.) Next, minimize the sum: Energy = $$\sum_{\forall b} \Pi(b, R(b))$$ . We used the simulated annealing with pair interchange. END. In the above description, R(b) and the penalty are parameters. Our example will be described in the next section. ## 4 Experiments Our experiments are on two items: - 1. Wire-density by three total systems (placer + global router) - 2. Effect of choice of various admissible regions #### 4.1 Wire-Density by Total Systems We define three placement and global routing combined systems as follows. **F&P** (Flat and Predict) is what we proposed in this paper, where admissible region R(b) is $5 \times 5$ rectangle with b in the center. The penalty of each block connected with b is as shown in Fig 5(a). Fig. 4: Admissible Region of Block $b_1$ L&S (Length and Sequential) consists of a placer which aims to minimize the total length and a global router which processes nets sequentially from shortest one and its route is chosen to avoid the current congestive channels. C&S (Cut and Sequential) consists of a placer which cuts the area hierarchically based on mincut and assigns the blocks onto dissected subareas. The global router is the same as used in L&S. In order to make each reveal its feature, they are implemented very simply for fairness. The result is shown in Table 1. C&S is very fast but the quality is miserable. Usually C&S is employed to generate an initial solution. Therefore we discard this from comparison. We noticed that: - 1. As for the maximum density, F&P is fairly better (8.2 to 9.4) although the average density is comparable (4.2 to 4.2). - 2. As for the maximum length, F&P is far better (6.0 to 9.4) although the total length is slightly worse (1422.6 to 1406.2). This observation shows that minimizing the total length is not direct to reducing the maximum density nor the maximum length. It is worth to doubt if it is a right way to adopt total length minimization to reduce the maximum density or the maximum length, as have been many systems in FPGA design. Fig.5: Rectangle and Cross Admissible Regions In calculating the regional placement-energy of $b_1$ , the penalty of each connected block $(b_2, \ldots, b_6)$ is the product of the distance from $b_1$ and $\epsilon$ ( $\epsilon << 1$ ) if the block is inside the admissible region of $b_1$ , or the square of the minimium distance to the boundary of admissible region otherwise. ## 4.2 Various Admissible Regions In the above experiment, we adopted rectangle R(b). To see the difference of different admissible regions, we experimented following two cases: - 1. Rectangle admissible region R(b) is the $5 \times 5$ rectangle with b in the center for any b (Fig.5(a)). $\Pi(b, R(b))$ is denoted $\Pi_{rect}$ . - 2. Cross admissible region R(b) is the $7 \times 7$ rectangle minus $3 \times 3$ rectangle corners with b in the center for any b. $\Pi(b, R(b))$ is denoted $\Pi_{cross}$ (Fig.5(b)). The result is shown in Table 2 in which the numbers of nets are given classified in terms of tangent y/x where y and x are the vertical and horizontal distances from the center, respectively. We observed that the distribution is very faithful to the shape of admissible regions. The most remarkable difference is found in positions y/x = 1/1 and 2/2 (underlined in the table) which are inside and outside the admissible regions of $\Pi_{rect}$ and $\Pi_{cross}$ , respectively. The situation can be visualized in the figures given in Fig.6. Since the FPGA routing resource prefers horizontal routings or vertical routings to slant routings, $\Pi_{cross}$ is recommended for effective use of routing resources. It is believed that the idea is effectively applicable to various routing resources in general placement. ## 5 Concluding Remarks It is conventional that the layout system is designed hierarchically to cope with explosion of the size. However, hierarchical approach contains an inheritant weakness that each optimization does not guarantee over-all quality. Therefore a careful consideration on consistency between stages is required. In this paper, we considered the placement and global routing stages to be consistent by providing an idea of channel-driven global router and flat-distribution placer. Another interesting idea in this paper is a proposal of evaluation systems of heuristics in laboratories, in place of industrial benchmarks. The key issue of our interactive dynamic test is to repeat the cycle to afford-solve-create the problems. This idea was born through the discussions with Prof. Majid Sarrafzadeh of Northwestern Univ. But its reliability or usefulness is not certain yet. ## Acknowledgements Authors are thankful to Dr. K.Fujiyoshi of JAIST for his helpful suggestions. ## References - A.Hashimoto and J. Stevens, "Wire routing by optimizing channel assignment within large apertures," Proc. 8th Design Automation Conference, Jun 1971, pp.155-163. - [2] Kirkpatrick, C. Gelett, and M. Vecchi, "Optimization by Simulated Annealing, Science," vol. 220, no. 4598, pp671-680, May 1983. - [3] J.T.Mowchenko and C.S.R.Ma, "A New Global Routing Algorithm for Standard Cell ICs", Proc. of International Symposium on Circuits and Systems, pp.27-30, 1987. - [4] J. Cong and B. Preas, "A New Algorithm for Standard Cell Global Routing," Proc. IEEE International Conference on Computer Aided Design, pp.176-179, Nov. 1988. - [5] S. D. Brown, R. J. Francis, J. Rose, and Z. G. Vranesic, Field-Programmable Gate Arrays, Kluwer Acadmic Publishers, 1992. - [6] S. D. Brown, J. Rose, and Z. G. Vranesic, "A Stochastic Model to Predict the Routability of Field-Programmable Gate Arrays", Proc. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 12, No. 12, pp 1827-1838, 1993. | method | wire- | density | le | CPU-time | | | |--------|---------|---------|--------|----------|-------|--| | | average | maximum | total | maximum | (sec) | | | FLP | 4.2 | 8.2 | 1422.6 | 6.0 | 498.0 | | | LES | 4.2 | 9.4 | 1406.2 | 9.4 | 441.1 | | | CES | 5.8 | 12.0 | 1918.6 | 15.6 | 28.6 | | Table 1: Experimental Result: Average Wire-Density of 5 Examples. number of logic blocks = 225 number of adjacent logic blocks ≈ 5 number of nets = 566 ~ 569 | Lired | | | | | | 11 gross | | | | | | |-------|------|-------|------|-----|-----|----------|------|------|------------------|------|------| | 1/x | U | | - 7 | 3 | 4 | Y/x | U | 1 | 2 | 3 | 4 | | 0 | - | 65.4 | 41.4 | 5.6 | 0.4 | 0 | - | 69.4 | 50.4 | 37.0 | 13.6 | | 1 | 59.8 | 111.0 | 74.8 | 8.6 | 0.2 | 1 | 61.6 | 54.0 | 51.4 | 35.6 | 0.2 | | 2 | 46.2 | 76.6 | 51.4 | 3.6 | - | 2 | 57.4 | 49.2 | 0.4 | 1.0 | - | | 3 | 6.0 | 11.4 | 4.2 | - | - | 3 | 35.8 | 34.8 | $\overline{1.0}$ | | | | 4 | 0.4 | 0.4 | - | - | | 4 | 13.2 | 1.0 | • | - | - | | 5 | - | • | - | - | - | 5 | 0.4 | • | - | - | - | Table 2: Distribution of Net Directions Applied to the Same Problems as in Table 2 Fig.6: Placements by IIrect and IIcress