ABSTRACT ICCAD'94


SESSION 1A MULTI-LEVEL LOGIC OPTIMIZATION

Moderators : Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA
Albert Wang, Synopsys, Inc., Mountain View, CA

1A.1 Perturb and Simplify: Multi-Level Boolean Network Optimizer

Shih-Chieh Chang, M. Marek-Sadowska, Univ. of California, Santa Barbara, CA

In this paper, we discuss the problem of optimizing a multi-level logic combinational Boolean network. Our techniques apply a sequence of local perturbations and modifications of the network which are guided by the automatic test pattern generation ATPG based reasoning. In particular, we propose several new ways in which one or more redundant gates or wires can be added to a network. We show how to identify gates which are good candidates for local functionality change. Furthermore, we discuss the problem of adding and removing two wires, none of which alone is redundant, but when jointly added /removed they do not affect functionality of the network. We also address the problem of efficient redundancy computation which allows to eliminate many unnecessary redundancy tests. We have performed experiments on MCNC benchmarks and compared the results to those of misII[4] and RAMBO[6]. Experimental results are very encouraging.

1A.2 Multi-Level Logic Optimization by Implication Analysis

Wolfgang Kunz, Univ. of Potsdam, Potsdam, Germany, Prem Menon, Univ. of Massachusetts, Amherst, MA

This paper proposes a new approach to multi-level logic optimization based on ATPG (Automatic Test Pattern Generation). Previous ATPG-based methods for logic minimization suffered from the limitation that they were quite restricted in the set of possible circuit transformations. We show that the ATPG-based method presented here allows (in principle) the transformation of a given combinational network C into an arbitrary, structurally different but functionally equivalent combinational network C’. Furthermore, powerful heuristics are presented in order to decide what network manipulations are promising for minimizing the circuit. By identifying indirect implications between signals in the circuit, transformations can be derived which are good candidates for the minimization of the circuit. In particular, it is shown that Recursive Learning can derive g“ood Boolean divisors justifying the effort to attempt a Boolean division. For 9 out of 10 ISCAS-85 benchmark circuits our tool HANNIBAL obtains smaller circuits than the well-known synthesis system SIS.

1A.3 Incremental Synthesis

Daniel Brand, IBM Research Division, Yorktown Heights, NY, Anthony D. Drumm, IBM AS/400 Division, Rochester, MN
Sandip Kundu, IBM Research Division, Yorktown Heights, NY, Prakash Narain, IBM Microelectronics, Endicott, NY

A small change in the input to logic synthesis may cause a large change in the output implementation. This is undesirable if a designer has some investment in the old implementation and does not want it perturbed more than necessary. We describe a method that solves this problem by reusing gates from the old implementation, and restricting synthesis to the modified portions only.


SESSION 1B MEMORY ISSUES IN HIGH-LEVEL SYNTHESIS

Moderators: Don MacMillen, Synopsys, Inc., Mountain View, CA
Giovanni De Micheli, Stanford Univ., Stanford, CA

1B.1 Definition and Solution of the Memory Packing Problem for Field-Programmable Systems

David Karchmer, Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada

This paper defines a new optimization problem that arises in the use of a Field-Programmable System (FPS). An FPS consists of a set of Field-Programmable Gate Arrays and memories, and is used both for emulation of ASICs and computation. In both cases the application circuits will include a set of memories which may not match the number and aspect ratio of the physical memories available on the FPS. This can often require that the physical memories be time-multiplexed to implement the required memories, in a circuit we call a memory organizer.

We give a precise definition of the packing optimization problem and present an algorithm for its solution. The algorithm has been implemented in a CAD tool that automatically produces a memory organizer circuit ready for synthesis by a commercial FPGA tool set.

1B.2 Integrating Program Transformations in the Memory-Based Synthesis of Image and Video Algorithms

David J. Kolson, Alexandru Nicolau, Nikil Dutt, Univ. of California, Irvine, CA

In this paper we discuss the interaction and integration of two important program transformations in high-level synthesis - Tree Height Reduction and Redundant Memory-access Elimination. Intuitively, these program transformations do not interfere with one another as they optimize different operations in the program graph and different resources in the synthesized system. However, we demonstrate that integration of the two tasks is necessary to better utilize available resources. Our approach involves the use of a "meta-transformation" to guide transformation application as possibilities arise. Results observed on several image and video benchmarks demonstrate that transformation integration increases performance through better resource utilization.

1B.3 DataFlow-Driven Memory Allocation for Multi-Dimensional Signal Processing Systems

Florin Balasa, Francky Catthoor, Hugo De Man, IMEC, Leuven, Belgium

Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing systems. In this paper, a novel background memory allocation and assignment technique is presented. It is intended for a behavioural algorithm specification, where the procedural ordering of the memory related operations is not yet fully fixed. Instead of the more restricted classical scheduling-based explorations, starting from procedurally interpreted specifications in terms of loops, a novel optimization approach - driven by data-flow analysis - is proposed. Employing the estimated silicon area as a steering cost, this allocation/assignment technique yields one or (optionally) several distributed (multi-port) memory architecture(s) with fully-determined characteristics, complying with a given clock cycle budget for read/write operations. Moreover, our approach can accurately deal with complex multi-dimensional signals by means of a polyhedral data-flow analysis operating with groups of scalars.


SESSION 1C TEST GENERATION

Moderators: Fadi Maamari, AT&T Bell Labs., Princeton, NJ
Chen-Shang Lin, National Taiwan Univ., Taipei, Taiwan, ROC

Bridge-type defects play a dominant role in state-of-the-art CMOS technologies. This paper describes a combined functional and overcurrent-based test generation approach for CMOS circuits, which is optionally based on layout information. Comparative results for benchmark circuits are given to demonstrate the feasibility of voltage-based versus IDDQ-based testing.

1C.1 Test Generation for Bridging Faults in CMOS ICs Based on Current Monitoring Versus Signal Propagation

Uwe Gläser, Theodor Vierhaus, Marco Kley, Anja Wiederhold, GMD, Sankt Augustin, Germany

Bridge-type defects play a dominant role in state-of-the-art CMOS technologies. This paper describes a combined functional and overcurrent-based test generation approach for CMOS circuits, which is optionally based on layout information. Comparative results for benchmark circuits are given to demonstrate the feasibility of voltage-based versus IDDQ-based testing.

1C.2 Iterative [Simulation Based Genetics + Deterministic Techniques] = Complete ATPG

Daniel G. Saab, Univ. of Illinois, Urbana, IL, Youssef G. Saab, Univ. of Missouri, Columbia, MO,
Jacob A. Abraham, Univ. of Texas, Austin, TX

Simulation-based test vector generators require much less computer time than deterministic ATPG but they generate longer test sequences and sometimes achieve lower fault coverage. This is due to the divergence in the search process. In this paper, we propose a correction technique for simulation-based ATPG. This technique is based on identifying the diverging state and on computing a fault cluster (faults close to each other). A set of candidate faults from the cluster is targeted with a deterministic ATPG and the resulting test sequence is used to restart the search process of the simulation-based technique. This above process is repeated until all faults are detected or proven to be redundant/untestable. The program implementing this approach has been used to generate tests with very high fault coverage, and runs about 10 times faster than traditional deterministic techniques with very good test quality in terms of test length and fault coverage.

1C.3 Analytical Fault Modeling and Static Test Generation for Analog ICs

Giri Devarayanadurg, Mani Soma, Univ. of Washington, Seattle, WA

Static tests are key in reducing the current high cost of testing analog and mixed-signal ICs. A new DC test generation technique for detecting catastrophic failures in this class of circuits is presented. To include the effect of tolerance of parameters during testing, the test generation problem is formulated as a minimax optimization problem, and solved iteratively as successive linear programming problems. An analytical fault modeling technique, based on manufacturing defect statistics is used to derive the fault list for the test generation. Using the technique presented here an efficient static test set for analog and mixed-signal ICs can be constructed, reducing both the test time and the packaging cost.


SESSION 1D PARTITIONING/CLUSTERING

Moderators: Ralph Otten, Delft Univ. of Tech., Delft, The Netherlands
Choon kyung Kim, Goldstar Electron Co., Ltd., Seoul, Korea

1D.1 Efficient Network Flow Based Min-Cut Balanced Partitioning

Honghua Yang, D.F. Wong, Univ. of Texas, Austin, TX

We consider the problem of bipartitioning a circuit into two balanced components that minimizes the number of crossing nets. Previously, the Kernighan and Lin type (K&L) heuristics, the simulated annealing approach, and the spectral method were given to solve the problem. However, network flow techniques were overlooked as a viable approach to min-cut balanced bipartition due to its high complexity. In this paper we propose a balanced bipartition heuristic based on repeated max-flow min-cut techniques, and give an efficient implementation that has the same asymptotic time complexity as that of one max-flow computation. We implemented our heuristic algorithm in a package called FBB. The experimental results demonstrate that FBB outperforms the K&L heuristics and the spectral method in terms of the number of crossing nets, and the efficient implementation makes it possible to partition large circuit instances with reasonable runtime. For example, the average elapsed time for bipartitioning a circuit S35932 of almost 20K gates is less than 20 minutes.

1D.2 Multi-Way VLSI Circuit Partitioning Based on Dual Net Representation

Jason Cong, Wilburt Labio, Narayanan Shivakumar, Univ. of California, Los Angeles, CA

In this paper, we study the area-balanced multi-way partitioning problem of VLSI circuits based on a new dual netlist representation named the hybrid dual netlist (HDN). Given a netlist, we first compute a K-way partition of the nets based on the HDN representation, and then transform a K-way net partition into a K-way module partitioning solution. The main contribution of our work is the formulation and solution of the K-way module contention (K-MC) problem, which determines the best assignment of the modules in contention to partitions, while maintaining user-specified area requirements, when we transform the net partition into a module partition. Under a natural definition of binding factor between nets and modules, and preference function between partitions and modules , we show that the K-MC problem can be reduced to a min-cost max-flow problem. We present efficient solutions to the K-MC problem based on network flow computation. Extensive experimental results show that our algorithm consistently outperforms the conventional K-FM partitioning algorithm by a significant margin.

1D.3 A General Framework for Vertex Orderings, with Applications to Netlist Clustering

Charles J. Alpert, Andrew B. Kahng, Univ. of California, Los Angeles, CA

We present a general framework for the construction of vertex orderings for netlist clustering. Our WINDOW algorithm constructs an ordering by iteratively adding the vertex with highest attraction to the existing ordering. Variant choices for the attraction function allow our framework to subsume many graph traversals and clustering objectives from the literature. The DP-RP method of [3] is then applied to optimally split the ordering into a k-way clustering. Our approach is adaptable to user-specified cluster size constraints. Experimental results for clustering and multi-way partitioning are encouraging.


SESSION 2A SEQUENTIAL SYNTHESIS FOR LOW POWER

Moderators: Ellen M. Sentovich, Univ. of California, Berkeley, CA Luciano Lavagno, Politecnico di Torino, Torino, Italy

2A.1 Re-Encoding Sequential Circuits to Reduce Power Dissipation

Gary D. Hachtel, Mariano Hermida, Abelardo Pardo, Massimo Poncino, Fabio Somenzi, Univ. of Colorado, Boulder, CO

We present a fully implicit encoding algorithm for minimization of average power dissipation in sequential circuits, based on the reduction of the average number of bit changes per state transition.

We have studied two novel schemes for this purpose, one based on recursive weighted non-bipartite matching, and one on recursive mincut bi-partitioning. We employ ADDs (Algebraic Decision Diagrams) to computate the transition probabilities, to measure potential area saving, and in the encoding algorithms themselves.

Our experiments show the effectiveness of our method in reducing power dissipation for large sequential designs.

2A.2 Precomputation-Based Sequential Logic Optimization for Low Power

Mazhar Alidina, José Monteiro, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA,
Abhijit Ghosh, Mitsubishi Electronics, Sunnyvale, CA, Marios Papaefthymiou, Yale Univ., New Haven, CT

We address the problem of optimizing logic-level sequential circuits for low power. We present a powerful sequential logic optimization method that is based on selectively precomputing the output logic values of the circuit one clock cycle before they are required, and using the precomputed values to reduce internal switching activity in the succeeding clock cycle. We present two different precomputation architectures which exploit this observation.

We present an automatic method of synthesizing pre-computation logic so as to achieve maximal reductions in power dissipation. We present experimental results on various sequential circuits. Up to 75% reductions in average switching activity and power dissipation are possible with marginal increases in circuit area and delay.

2A.3 Low Power State Assignment Targeting Two-And Multi-Level Logic Implementations

Chi-ying Tsui, Massoud Pedram, Chih-ang Chen, Alvin Despain, Univ. of Southern California, Los Angeles, CA

The problem of minimizing power consumption during the state encoding of a finite state machine is considered. A new power cost model for state encoding is proposed and encoding techniques that minimize this power cost for two- and multi-level logic implementations are described. These techniques are compared with those which minimize area or the switching activity at the present state bits. Experimental results show significant improvements.


SESSION 2B SYSTEM OPTIMIZATION, PARTITIONING, AND INTEGRATION

Moderators: Wayne Wolf, Princeton Univ., Princeton, NJ
Gaetano Borriello, Univ. of Washington, Seattle, WA

2B.1 Algorithm Selection: A Quantitative Computation-Intensive Optimization Approach

Miodrag Potkonjak, NEC USA, C&C Research Labs., Princeton, NJ, Jan Rabaey, Univ. of California, Berkeley, CA

Given a set of specifications for a targeted application, algorithm selection refers to choosing the most suitable algorithm for a given goal, among several functionally equivalent algorithms. We demonstrate an extraordinary potential of algorithm selection for achieving high throughput, low cost, and low power implementations.

We introduce an efficient technique for low-bound evaluation of the throughput and cost during algorithm selection and propose a relaxation-based heuristic for throughput optimization. We also present an algorithm for cost optimization using algorithm selection. The effectiveness of methodology and algorithms is illustrated using examples.

2B.2 Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-Synthesis

Jörg Henkel, Rolf Ernst, Ulrich Holtmann, Thomas Benner, Technische Univ. Braunschweig, Braunschweig, Germany

Previously, we had presented the system COSYMA for hardware/software co-synthesis of small embedded controllers [ErHeBe93]. Target system of COSYMA is a core processor with application specific co–processors. The system speedup for standard programs compared to a single 33MHz RISC processor solution with fast, single cycle access RAM was typically less than 2 due to restrictions in high-level co–processor synthesis, and incorrectly estimated back end tool performance, such as hardware synthesis, compiler optimization and communication optimization. Meanwhile, a high-level synthesis tool for high-performance coprocessors in co-synthesis has been developed. This paper explains the requirements and the main features of the high-level synthesis system and its integration into COSYMA. The results show a speedup of 10 in most cases. Compared to the speedup, the co–processor size is very small.

2B.3 Synthesis of Concurrent System Interface Modules with Automatic Protocol Conversion Generation

Bill Lin, Steven Vercauteren, IMEC, Leuven, Belgium

We describe a new high-level compiler called Integral for designing system interface modules. The input is a high-level concurrent algorithmic specification that can model complex concurrent control flow, logical and arithmetic computations, abstract communication, and low-level behavior. For abstract communication between two communicating modules that obey different I/O protocols, the necessary protocol conversion behaviors are automatically synthesized using a Petri net theoretic approach. We present a synthesis trajectory that can synthesize the necessary hardware resources, control circuitry, and protocol conversion behaviors for implementing system interface modules.


SESSION 2C BUILT-IN SELF-TEST

Moderators: Robert C. Aitken, Hewlett-Packard Co., Palo Alto, CA
Sy-Yen Kuo, National Taiwan Univ., Taipei, Taiwan, TOC

2C.1 An Efficient Procedure for the Synthesis of Fast Self-Testable Controller Structures

Sybille Hellebrand, Hans-J. Wunderlich, Univ. of Siegen, Siegen, Germany

The BIST implementation of a conventionally synthesized controller in most cases requires the integration of an additional register only for test purposes. This leads to some serious drawbacks concerning the fault coverage, the system speed and the area overhead. A synthesis technique is presented which uses the additional test register also to implement the system function by supporting self-testable pipeline-like controller structures. It will be shown, that if the need of two different registers in the final structure is already taken into account during synthesis, then the overall number of flipflops can be reduced, and the fault coverage and system speed can be enhanced. The presented algorithm constructs realizations of a given finite state machine specification which can be trivially implemented by a self-testable structure. The efficiency of the procedure is ensured by a very precise characterization of the space of suitable realizations, which avoids the computational overhead of previously published algorithms.

2C.2 Test Pattern Generation Based on Arithmetic Operations

Sanjay Gupta, Janusz Rajski, Jerzy Tyszer, McGill Univ., Montréal, Quebec, Canada

Existing built-in self test (BIST) strategies require the use of specialized test pattern generation hardware which introduces significant area overhead and performance degradation. In this paper, we propose a novel method for implementing test pattern generators based on adders widely available in data-path architectures and digital signal processing circuits. Test patterns are generated by continuously accumulating a constant value and their quality is evaluated in terms of the pseudo-exhaustive state coverage on subspaces of contiguous bits. This new test generation scheme, along with the recently introduced accumulator-based compaction scheme facilitates a BIST strategy for high performance datapath architectures that uses the functionality of existing hardware, is entirely integrated with the circuit under test, and results in at-speed testing with no performance degradation and area overhead.

2C.3 Random Pattern Testable Logic Synthesis

Chen-Huan Chiang, Sandeep Gupta, Univ. of Southern California, Los Angeles, CA

Previous procedures for synthesis of testable logic guarantee that all faults in the synthesized circuits are detectable. However, the detectability of many faults in these circuits can be very low leading to poor random pattern testability. A new procedure to perform logic synthesis that synthesizes random pattern testable multilevel circuits is proposed. Experimental results show that the circuits synthesized by the proposed procedure tstfx are significantly more random pattern testable and smaller than those synthesized using its counterpart fast extract (fx) in SIS. The proposed synthesis procedure designs circuits that require only simple random pattern generators in built-in self-test, thereby obviating the need for complex BIST circuitry.


SESSION 2D PLACEMENT

Moderators: Choon Kyung Kim, Goldstar Electron Co. Ltd., Seoul, Korea
Jason Cong, Univ. of California, Los Angeles, CA

2D.1 Compression-Relaxation: A New Approach to Performance Driven Placement for Regular Architectures

Anmol Mathur, C. L. Liu, Univ. of Illinois, Urbana, IL

We present a new iterative algorithm for performance driven placement applicable to regular architectures such as FPGAs. Our algorithm has two phases in each iteration: a compression phase and a relaxation phase. We employ a novel compression strategy based on the longest path tree of a cone for improving the timing performance of a given placement. Compression might cause a feasible placement to become infeasible. The concept of a slack neighborhood graph is introduced and is used in the relaxation phase to transform an infeasible placement to a feasible one using a mincost flow formulation. Our analytical results regarding the bounds on delay increase during relaxation are validated by the rapid convergence of our algorithm on benchmark circuits. We obtain placements that have 13% less critical path delay (on the average) than those generated by the Xilinx automatic place and route tool (apr) on technology mapped MCNC benchmark circuits with significantly less CPU time than apr.

2D.2 A Loosely Coupled Parallel Algorithm for Standard Cell Placement

Wern-Jieh Sun, Carl Sechen, Univ. of Washington, Seattle, WA

We present a loosely coupled parallel algorithm for the placement of standard cell integrated circuits. Our algorithm is a derivative of simulated annealing. The implementation of our algorithm is targeted toward networks of UNIX workstations. This is the very first reported parallel algorithm for standard cell placement which yields as good or better placement results than its serial version. In addition, it is the first parallel placement algorithm reported which offers nearly linear speedup, in terms of the number of processors (workstations) used, over the serial version. Despite using the rather slow local area network as the only means of interprocessor communication, the processor utilization is quite high, up to 98% for 2 processors and 90% for 6 processors. The new parallel algorithm has yielded the best overall results ever reported for the set of MCNC standard cell benchmark circuits.

2D.3 Delay and Area Optimization for Compact Placement by Gate Resizing and Relocation

Weitong Chuang, AT&T Bell Labs., Murray Hill, NJ, Ibrahim N. Hajj, Univ. of Illinois, Urbana, IL

In this paper, we first present an efficient algorithm for the gate sizing problem. Then we propose an algorithm which performs delay and area optimization for a given compact placement by resizing and relocating cells in the circuit lay-out. Since the gate sizing procedure is embedded within the placement adjustment process, interconnect capacitance information is included in the gate size selection process. As a result, the algorithm is able to obtain superior solutions.


SESSION 3A FPGA SYNTHESIS AND ARCHITECTURE

Moderators: Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada
Herve Touati, Digital Equipment Corp., Rueil-Malmaison, France

3A.1 Edge-Map: Optimal Performance Driven Technology Mapping for Iterative Lut Based FPGA Designs

Honghua Yang, D.F. Wong, Univ. of Texas, Austin, TX

We consider the problem of performance driven lookup-table (LUT) based technology mapping for FPGAs using a general delay model. In the general delay model, each interconnection edge has a weight representing the delay of the interconnection. This model is particularly useful when combined with an iterative re-technology mapping process where the actual delays of the placed and routed circuit are fed-back to the technology mapping phase to improve the mapping based on the more realistic delay estimation. Well known technology mappers such as FlowMap and Chortle-d only minimize the number of levels in the technology mapped circuit and hence are not suitable for such an iterative re-technology mapping process. Recently, Mathur and Liu in [ML94] studied the performance driven technology mapping problem using the general delay model and presented an effective heuristic algorithm for the problem. In this paper, we present an efficient technology mapping algorithm that achieves provably optimal delay in the technology mapped circuit using the general delay model. Our algorithm is a non-trivial generalization of FlowMap. A key problem in our algorithm is to compute a K-feasible network cut such that the circuit delay on every cut edge is upper-bounded by a specific value. We implemented our algorithm in a LUT based FPGA technology mapping package called Edge-Map, and tested Edge-Map on a set of benchmark circuits.

3A.2 Maple: A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for Field-Programmable Gate Arrays

Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki, Waseda Univ., Tokyo, Japan

Technology mapping algorithms for LUT (Look Up Table) based FPGAs have been proposed to transfer a Boolean network into logic-blocks. However, since those algorithms take no lay-out information into account, they do not always lead to excellent results. In this paper, a simultaneous technology mapping, placement and global routing algorithm for FPGAs, Maple, is presented. Maple is an extended version of a simultaneous placement and global routing algorithm for FPGAs, which is based on recursive partition of layout regions and block sets. Maple inherits its basic process and executes the technology mapping simultaneously in each recursive process. Therefore, the mapping can be done with the placement and global routing information. Experimental results for some benchmark circuits demonstrate its efficiency and effectiveness.

3A.3 Universal Logic Gate for FPGA Design

Chih-chang Lin, M. Marek-Sadowska, Duane Gatlin, Univ. of California, Santa Barbara, CA

In this paper the problem of selecting an appropriate programmable cell structure for FPGA architecture design is addressed. The cells studied here can be configured to the desired functionality by applying input permutation, negation, bridging or constant assignment, or output negation. A general methodology to determine logic description of such cells, which are capable of being configured to a given set of functions is described.

Experimental results suggest that the new cell behaves as well as the Actel 2 cell in terms of logic power but requires substantially less area and wiring over-head.


SESSION 3B CONCURRENCY MODELING AND ESTIMATION

Moderators: Robert Walker, Rensselaer Polytechnic Inst., Troy, NY
Kazutoshi Wakabayashi, Stanford Univ., Stanford, CA

3B.1 Condition Graphs for High-Quality Behavioral Synthesis

Hsiao-Ping Juan, Viraphol Chaiyakul, Daniel D. Gajski, Univ. of California, Irvine, CA

Identifying mutual exclusiveness between operators during behavioral synthesis is important in order to reduce the required number of control steps or hardware resources. To improve the quality of the synthesis result, we propose a representation, the Condition Graph, and an algorithm for identification of mutually exclusive operators. Previous research efforts have concentrated on identifying mutual exclusiveness by examining language constructs such as IF-THEN-ELSE statements. Thus, their results heavily depend on the description styles. The proposed approach can produce results independent of description styles and identify more mutually exclusive operators than any previous approaches. The Condition Graph and the proposed algorithm can be used in any scheduling or binding algorithms. Experimental results on several benchmarks have shown the efficiency of the proposed representation and algorithm.

3B.2 Dynamic Scheduling and Synchronization Synthesis of Concurrent Digital Systems under System-Level Constraints

Claudionor Nunes Coelho Jr., Giovanni De Micheli, Stanford Univ., Stanford, CA

We present in this paper a novel control synthesis technique for system-level specifications that are better described as a set of concurrent synchronous descriptions, their synchronizations and constraints. The proposed synthesis technique considers the degrees of freedom introduced by the concurrent models and by the environment in order to satisfy the design constraints.

Synthesis is divided in two phases. In the first phase, the original specification is translated into an algebraic system, for which complex control-flow constraints and quantifiers of the design are determined. This algebraic system is then analyzed and the design space of the specification is represented by a finite-state machine, from which a set of Boolean formulas is generated and manipulated in order to obtain a solution. This method contrasts with usual high-level synthesis methods in that it can handle arbitrarily complex control-flow structures, concurrency and synchronization by allowing the scheduling of the operations to change dynamically over time.

3B.3 Comprehensive Lower Bound Estimation from Behavioral Descriptions

Seong Y. Ohm, Fadi J. Kurdahi, Nikil Dutt, Univ. of California, Irvine, CA

In this paper, we present a comprehensive technique for lower bound estimation (LBE) of resources from behavioral descriptions. Previous work has focused on LBE techniques that use very simple cost models which primarily focus on the functional unit resources. Our cost model accounts for storage resources in addition to functional resources. Our timing model uses a finer granularity that permits the modeling of functional unit, register and interconnect delays. We tested our LBE technique for both functional unit and storage requirements on several high-level synthesis benchmarks and observed near-optimal results.


SESSION 3C TIMING MODELING AND SIMULATION

Moderators: Andrew T. Yang, Univ. of Washington, Seattle, WA
Lawrence T. Pillage, Univ. of Texas, Austin., TX

3C.1 Fast and Accurate Timing Simulation with Regionwise Quadratic Models of MOS I-V Characteristics

A. Dharchoudhury, Sung Mo Kang, Univ. of Illinois, Urbana, IL
K.H. Kim, S.H. Lee, Samsung Electronics Co., Kyung-Ki-Do, Korea

This paper presents a technique called regionwise quadratic (RWQ) modeling that allows highly accurate MOS models, as well as measured I-V data, to be used in fast timing simulation. This technique significantly increases the accuracy of fast timing simulation while maintaining efficiency by permitting analytical solutions of node equations. A fast timing simulator using these RWQ models has been implemented. Several examples of RWQ modeling are provided, and comparisons of simulation results with SPICE3 are shown to demonstrate accuracy and efficiency. Speedups of two to three orders of magnitude for circuits containing up to 2000 transistors are observed.

3C.2 VLSI Timing Simulation with Selective Dynamic Regionization

Meng-Lin Yu, Bryan D. Ackland, AT&T Bell Labs., Holmdel, NJ

Accurate timing simulations are crucial to the design of MOS VLSI circuits, but can take prohibitively large amounts of time. This paper describes dynamic regionization techniques applied to an event based simulator for MOS timing simulation that have proven to be more efficient and as accurate as the static regionization method. The MOS network is first statically partitioned into groups of strongly coupled nodes called regions. Big regions are then incrementally and dynamically partitioned into and replaced by subregions. Subregions are treated just like normal regions in the event based simulation process. This simulator has been used to verify the timing and functionality of several large VLSI chips. Performance is 3 to 7 times faster than a static regionization method.

3C.3 A New Efficient Approach to Statistical Delay Modeling of CMOS Digital Combinational Circuits

S. A. Aftab, Motorola Strategis Systems Tech., Temple, AZ, M.A. Styblinski, Texas A&M Univ., College Station, TX

This paper presents one of the first attempts to statistically characterize signal delays of basic CMOS digital combinational circuits using the transistor level approach. Hybrid analytical/iterative delay expressions in terms of the transistor geometries and technological process variations are created for basic building blocks. Local delays of blocks along specific signal paths are combined together for the analysis of complex combinational VLSI circuits. The speed of analysis is increased by 2 to 4 orders of magnitude relative to SPICE, with about 5-10% accuracy. The proposed approach shows good accuracy in modeling the influence of the "noise" parameters on circuit delay relative to direct SPICE-based Monte Carlo analysis. Examples of statistical delay characterization are shown. The important impact of the proposed approach is that statistical evaluation and optimization of delays in much larger VLSI circuits will become possible.


SESSION 3D CLOCK AND ROUTING ALGORITHMS FOR HIGH PERFORMANCE SYSTEMS

Moderators: Michael Jackson, Motorola, Austin, TX
Ren-Song Tsay, ArcSys, Inc., Sunnyvale, CA

3D.1 Simultaneous Driver and Wire Sizing for Performance and Power Optimization

Jason Cong, Cheng-Kok Koh, Univ. of California, Los Angeles, CA

In this paper, we study the simultaneous driver and wiresizing(SDWS) problem under two objective functions: (i) delay minimization only, or (ii) combined delay and power dissipation minimization. We present general formulations of the SDWS problem under these two objectives based on the distributed Elmore delay model with consideration of both capacitive power dissipation and short-circuit power dissipation. We show several interesting properties of the optimal SDWS solutions under the two objectives, including an important result (Theorem 3) which reveals the relationship between driver sizing and optimal wire sizing. These results lead to polynomial time algorithms for computing the lower and upper bounds of optimal SDWS solutions under the two objectives, and efficient algorithms for computing optimal SDWS solutions under the two objectives. We have implemented these algorithms and compared them with existing design methods for driver sizing only or independent driver and wire sizing. Accurate SPICE simulation shows that our methods reduce the delay by up to 11%-47% and power dissipation by 26%-63% compared with existing design methods.

3D.2 Low-Cost Single Layer Clock Trees with Exact Zero Elmore Delay Skew

Andrew B. Kahng, C.-W Albert Tsao, Univ. of California, Los Angeles, CA

We give the first single-layer clock tree construction with exact zero skew according to the Elmore delay model. The previous Linear-Planar-DME method [11] guarantees a planar solution under the linear delay model. In this paper, we use a Linear-Planar-DME variant connection topology to construct a low-cost zero skew tree (ZST) according to the Elmore delay model. While a linear-delay ZST is trivially converted to an Elmore-delay ZST by "detouring" wires, the key idea is to defer this detouring as much as possible to reduce tree cost. Costs of our planar ZST solutions are comparable to those of the best previous non-planar ZST solutions, and substantially improve over previous planar clock routing methods.

3D.3 Clock-Period Constrained Minimal Buffer Insertion in Clock Trees

Gustavo E. Téllez, Majid Sarrafzadeh, Northwestern Univ., Evanston, IL

In this paper we investigate the problem of computing a lower bound on the number of buffers required when given a maximum clock frequency and a predefined clock tree. Using generalized properties of published CMOS timing models, we formulate a novel non-linear and a simplified linear buffer insertion problem. We solve the later optimally with an O(n) algorithm. The basic formulation and algorithm are extended to include a skew upper bound constraint. Using these algorithms we propose further algorithmic extensions that allow area and phase delay tradeoffs. Our results are verified using SPICE3e2 simulations with MCNC MOSIS 2.0(u) models and parameters. Experiments show our buffer insertion algorithms can be used effectively for high-speed clock designs.


SESSION 4A RETIMING AND SEQUENTIAL TECHNOLOGY MAPPING

Moderators: Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, Ma
Ellen M. Sentovich, Univ. of California, Berkeley, CA

4A.1 Efficient Implementation of Retiming

Narendra Shenoy, Richard Rudell, Synopsys, Inc., Mountain View, CA

Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Leiserson and Saxe first presented a theoretical formulation to solve this problem for single-clock edge-triggered sequential circuits. Their proposed algorithms have polynomial complexity; however naive implementation s of these algorithms exhibit O(n^3) time complexity and O(n^2) space complexity when applied to digital circuits with n combinational cells. This renders retiming ineffective for circuits with more than 500 combinational cells. This paper addresses the implementation issues required to exploit the sparsity of circuit graphs to allow min-period retiming and constrained min-area retiming to be applied to circuits with as many as 10,000 combinational cells. We believe this is the first paper to address these issues and the first to report retiming results for large circuits.

4A.2 Retiming with Non-Zero Clock Skew, Variable Register, and Interconnect Delay

Tolga Soyata, Eby G. Friedman, Univ. of Rochester, Rochester, NY

A retiming algorithm is presented which includes the effects of variable register, clock distribution, and interconnect delay. These delay components are incorporated into retiming by assigning Register Electrical Characteristics (RECs) to each edge in the graph representation of the synchronous circuit. A matrix (called the Sequential Adjacency Matric or SAM) is presented that contains all path delays. Timing constraints for each data path are derived from this matrix. Vertex lags are assigned ranges rather than single values as in standard retiming algorithms. The approach used in the proposed algorithm is to initialize these ranges with unbounded values and continuously tighten these ranges using localized timing constraints until an optimal solution is obtained. The algorithm is demonstrated on modified MCNC benchmark circuits and both increased clock frequencies and elimination of all race conditions are observed.

4A.3 Optimal Latch Mapping and Retiming Within a Tree

Joel Grodstein, Eric Lehman, Heather Harkness, Bill Grundmann, Herve Touati, Digital Equipment Corp., Hudson, MA

We propose a technology mapping algorithm that takes existing structural technology-mapping algorithms based on dynamic programming [1,3,4] and extends them to retime pipelined circuits. If the circuit to be mapped has a tree structure, our algorithm generates an optimal solution compatible with that structure. The algorithm takes into account gate delays and capacitive loads as latches are moved across the logic. It also supports latches with embedded logic: i.e., cells that combine a D latch with a combinational gate at little extra cost in latch delay.


SESSION 4B ISSUES IN DISCRETE SIMULATION

Moderators: Steve Tjiang, Synopsys, Inc., Mountain View, CA
Mary Bailey, Univ. of Arizona, Tucson, AZ

4B.1 Simulation of Digital Circuits in the Presence of Uncertainty

Mark Linderman, Miriam Leeser, Cornell Univ., Ithaca, NY

Current extended value set dynamic timing analyzers are not sophisticated enough to detect the subtle timing relationships upon which timing-critical systems depend, and exhaustive simulation achieves very accurate results but at tremendous computational cost. MTV is a simulator that strikes a balance between accuracy and efficiency.

MTV is more accurate than other extended value set simulators because it respects the ordering of events. It is more efficient than exhaustive simulators because it efficiently simulates overlapping events and requires only a single waveform to represent a signal. Features of MTV include: elimination of common ambiguity, symbolic delays, correlated delays, and sophisticated algorithms to detect ordered events. This paper concludes with simulation results from the ISCAS85 benchmark suite.

4B.2 Fast Transient Power and Noise Estimation for CMOS VLSI Circuits

Wolfgang T. Eisenmann, Motorola GmbH, Munich, Germany
Helmut E. Graeb, Technical Univ. of Munich, Munich, Germany

Today's digital design systems are running out of steam, when it comes to meeting the challenges presented by simultaneous switching, power consumption and reliability constraints emerging in VLSI circuits. In this paper a new technique to accurately estimate the transient behavior of large CMOS cell-based circuits in a reasonable amount of time is presented. Gate-level simulation and a consistent modeling methodology are employed to compute the time-domain waveforms for signal voltages, supply currents, power consumption and (delta)I noise on power lines. This can be done for circuit blocks and complete designs by our new tool POWTIM, which adds SPICE-like capabilities to digital design systems.

4B.3 The Inversion Algorithm for Digital Simulation

Peter Maurer, Univ. of South Florida, Tampa, FL

The Inversion Algorithm is an event-driven algorithm, whose performance rivals or exceeds that of Levelized Compiled code simulation, even at activity rates of 50% or more. The Inversion Algorithm has several unique features, the most remarkable of which is the size of the run-time code. The basic Algorithm can be implemented using no more than a page of run-time code, although in practice it is more efficient to provide several different variations of the basic algorithm. The run-time code is independent of the circuit under test, so the algorithm can be implemented either as a compiled code or an interpreted simulator with little variation in performance. Because of the small size of the run-time code, the run-time portions of the Inversion Algorithm can be implemented in assembly language for peak efficiency, and still be retargeted for new platforms with little effort.


SESSION 4C TECHNOLOGY CAD

Moderators: Sung Mo Kang, Univ. of Illinois, Urbana, IL
Don Scharfetter, Intel Corp., Santa Clara, CA

4C.1 Unified Complete Mosfet Model for Analysis of Digital and Analog Circuits

M. Miura-Mattausch, U. Feldmann, A. Rahm, M. Bollu, D. Savignac, Siemens AG, Munich, Germany

In this paper, we describe the complete MOSFET model developed for circuit simulation. The model describes all transistor characteristics as functions of surface potentials, which are calculated iteratively at each applied voltage under the charge-sheet approximation. The key idea of this development is to put as much physics as possible into the equations describing the surface potentials. Since the model includes both the drift and the diffusion contributions, a single equation is valid from the subthreshold to the saturation regions. The unified treatment of our model allows all transistor characteristics to be calculated without any nonphysical fitting parameters. Additionally the calculation time is drastically reduced in comparison with a conventional piece-wise model.

4C.2 A Precorrected-FFT Method for Capacitance Extraction of Complicated 3-D Structures

J.R. Phillips, J. White, Massachusetts Inst. of Tech., Cambridge, MA

In this paper we present a new approach to three-dimensional capacitance extraction based on a pre-corrected FFT scheme. The approach is compared to the now commonly used multipole-accelerated algorithms for a variety of structures, and the new method is shown to have substantial performance and memory advantages.

4C.3 Measurement and Modeling of MOS Transistor Current Mismatch in Analog IC's

Eric Felt, Amit Narayan, Alberto-Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA

This paper presents a new methodology for measuring MOS transistor current mismatch and a new transistor current mismatch model. The new methodology is based on extracting the mismatch information from a fully functional circuit rather than on probing individual devices; this extraction leads to more efficient and more accurate mismatch measurement. The new model characterizes the total mismatch as a sum of two components, one systematic and the other random. For our process, we attribute nearly half of the mismatch to the systematic component, which we model as a linear gradient across the die. Furthermore, we present a new model for the random component of the mismatch which is 60% more accurate, on average, than existing models.


SESSION 4D MINIMIZING CLOCK SKEW

Moderators: Andrew B. Kahng, Univ. of California, Los Angeles, CA
Gary Yeap, Motorola, Inc., Temple, AZ

4D.1 Skew Sensitivity Minimization of Buffered Clock Tree

Jae Chung, Chung-Kuan Cheng, Univ. of California, La Jolla, CA

Given a topology of clock tree and a library of buffers, we propose an efficient skew sensitivity minimization algorithm using dynamic programming approach. Our algorithm finds the optimum buffer sizes, its insertion levels in the clock tree, and optimum wire widths to minimize the skew sensitivity under manufacturing variations. Careful fine tuning by shifting buffer locations at the last stage preserves the minimum skew sensitivity property and reduce the interconnect length. For a given clock tree of n points and a library of s different buffer sizes, the run time of the presented algorithm is O(log^3 n × s^2 ).

Experimental results show a significant reduction of clock skews ranging from 87 times to 144 times compared to the clock skews before applying the proposed algorithm. We also observe a further reduction of the propagation delay of clock signals as a result of applying the proposed skew sensitivity algorithm.

4D.2 Process-Variation-Tolerant Clock Skew Minimization

Shen Lin, C.K. Wong, IBM Corp., Yorktown Heights, NY

In this paper, we propose a novel hierarchical multiple-merge zero skew clock routing algorithm. The routing results produced by our approach will have zero skew in the nominal case and minimal skew increase in the presence of worst process variations. In order to construct such a clock routing, we formulate the linear placement with maximum spread problem and provide an O(n min{n,P} logn logP ) algorithm for optimally solving this problem, where n is the number of cells to be placed and P is the maximum spread. Experimental results show that our algorithm can indeed reduce the skew in various manufacturing variations effectively.

4D.3 A Specified Delay Accomplishing Clock Router Using Multiple Layers

Mitsuho Seki, Kenji Inoue, Hitachi, Ltd., Ibaraki-ken, Japan
Kazuo Kato, Kouki Tsurusaki, Shinichi Fukasawa, Hitoshi Sasaki, Mutsuhito Aizawa, Hitachi, Ltd., Tokyo, Japan

Clock routing to minimize the clock skew is very necessary to make high performances LSIs. Our clock routing method: (1) realizes the specified delay to each input terminal and provides a zero skew; (2)uses multiple routing layers for pin-to-pin routing; and (3) considers the delay arising from the resistances of a through-hole. Experimental results show that the delay is within 1% error compared to the specified delay and the skew can be controlled within pico second order.


SESSION 5A ESTIMATION TECHNIQUES FOR POWER CONSUMPTION

Moderators: Sujit Dey, C&C Research Labs., NEC USA, Inc., Princeton, NJ
Michel Berkelaar, IBM Corp., Yorktown Heights, NY

5A.1 Switching Activity Analysis Considering Spatiotemporal Correlations

Radu Marculescu, Diana Marculescu, Massoud Pedram, Univ. of Southern California, Los Angeles, CA

This work presents techniques for computing the switching activities of all circuit nodes under pseudorandom or biased input sequences and assuming a zero delay mode of operation. Complex spatiotemporal correlations among the circuit inputs and internal nodes are considered by using a lag-one Markov Chain model. Evaluations of the model and a comparative analysis presented for benchmark circuits demonstrates the accuracy and the practicality of the method. The results presented in this paper are useful in power estimation and low power design.

5A.2 Estimation of Circuit Activity Considering Signal Correlations and Simultaneous Switching

Tan-Li Chou, Kaushik Roy, Purdue Univ., West Lafayette, IN, Sharat Prasad, Texas Instrments Inc., Dallas, TX

This paper presents accurate estimation of signal activity at the internal nodes of CMOS combinational logic circuits. The methodology is based on stochastic model of logic signals and takes correlations and simultaneous switching of signals at logic gate inputs into consideration. In combinational logic synthesis, in order to minimize spurious transitions due to finite propagation delays, it is crucial to balance all signal paths and to reduce the logic depth [4]. As a result of balancing delays through different paths, the inputs to logic gates may switch at approximately the same time. We have developed and implemented an technique to calculate signal probability and switching activity of the CMOS combinational logic circuits. Experimental results show that if simultaneous switching is not considered the switching activities of the internal nodes can be off by more than 100% compared to simulation based techniques. In contrast, our technique is on the average within 2% of logic simulation results.

5A.3 A Cell-Based Power Estimation in CMOS Combinational Circuits

Jiing-Yuan Lin, T.C. Liu, Wen-Zen Shen, National Chiao Tung Univ., Hsin-chu, Taiwan, ROC

In this paper we present a power dissipation model considering the charging/discharging of capacitance at the gate output node as well as internal nodes, and capacitance feedthrough effect. Based on the model, a Cell-Based Power Estimation (CBPE) method is developed to estimate the power dissipation in CMOS combinational circuits. In our technique, we first construct a modified state transition graph called STGPE to model the power consumption behavior of a logic gate. Then, according to the input signal probabilities and transition densities of the logic gate, we perform an efficient method to estimate the expected activity number of each edge in the STGPE. Finally, the energy consumption of a logic gate is calculated by summing the energy consumptions of each edge in STGPE. For a set of benchmark circuits, experimental results show that the power dissipation estimated by CBPE is on average within 10-percent errors as compared to the exact SPICE simulation while the CPU time is more than two order-of-magnitudes faster.


SESSION 5B RESOURCE BINDING

Moderators: Reinaldo Bergamaschi, IBM Corp., Yorktown Heights, NY
Yukihiro Nakamura, NTT Knowledge Systems Lab., Yokosuka-Shi, Japan

5B.1 Design Exploration for High-Performance Pipelines

Smita Bakshi, Daniel D. Gajski, Univ. of California, Irvine, CA

Exploration plays an important role in the design of high-performance pipelines. We propose an exploration strategy for varying three design parameters by using a performance-constrained component selection and pipelining algorithm on different "architectures". The architecture is specified manually by using a mix of behavioral and structural constructs, while the component selection and pipelining is performed automatically using our algorithms. Results on two industrial-strength DSP systems, indicate the effectiveness of our strategy in exploring a large design space within a matter of seconds.

5B.2 Simultaneous Functional-Unit Binding and Floorplanning

Yung-ming Fang, D.F. Wong, Univ. of Texas, Austin, TX

As device feature size decreases, interconnection delay becomes the dominating factor of system performance. Thus it is important that accurate physical information is used during high level synthesis. In this paper, we consider the problem of simultaneously performing functional-unit binding and floorplanning. Experimental results indicate that our approach to combine binding and floorplanning is superior to the traditional approach of separating the two tasks.

5B.3 Module Selection and Data Format Conversion for Cost-Optimal DSP Synthesis

Kazuhito Ito, Lori E. Lucke, Keshab K. Parhi, Univ. of Minnesota, Minneapolis, MN

In high level synthesis each node of a synchronous data-flow graph (DFG) is scheduled to a specific time and allocated to a processor. In this paper we present new integer linear programming (ILP) models which generate a blocked schedule for a DFG with implicit retiming, pipelining, and unfolding while performing module selection and data format conversion. A blocked schedule is a schedule which overlaps multiple iterations of the DFG to guarantee a minimum number of processors. Component modules are selected from a library of processors to minimize cost. Furthermore, we include data format converters between processors of different data formats. In addition, we minimize the unfolding factor of the blocked schedule.


SESSION 5C DELAY AND ANALOG TESTING

Moderators: Daniel G. Saab, Univ. of Illinois, Urbana, IL
Sandip Kundu, IBM Corp., Yorktown Heights, NY

5C.1 On Testing Delay Faults in Macro-Based Combinational Circuits

Irith Pomeranz, Sudhakar Reddy, Univ. of Iowa, Iowa City, IA

We consider the problem of testing for delay faults in macro-based circuits. Macro-based circuits are obtained as a result of technology mapping. Gate-level fault models cannot be used for such circuits, since the implementation of a macro may not have an accurate gate-level counterpart, or the macro implementation may not be known. Two delay fault models are proposed for macro-based circuits. The first model is analogous to the gate-level gross delay fault model. The second model is analogous to the gate-level path delay fault model. We provide fault simulation procedures, and present experimental results.

5C.2 RAFT: A Novel Program for Rapid-Fire Test and Diagnosis of Digital Logic for Marginal Delays and Delay Faults

Abhijit Chatterjee, Georgia Inst. of Tech., Atlanta, GA, Jacob A. Abraham, Univ. of Texas, Austin, TX

The problem of delay fault-testing and detection of chips with marginal performance has become even more critical than before due to advancing clock speeds. In this paper methodology for detection of marginal digital circuits and diagnosis of gate delay failures is developed. A new test application methodology is proposed in which test vectors may be applied to digital combinational circuits at intervals smaller than the critical path delay of the circuit and signal waveform analysis is used to interpret the test results. The resulting test are called RApid Fire Tests (for RAFT) and allow classification of circuits from "good" to "bad" along a continuous scale.

5C.3 A Comprehensive Fault Macromodel for OPAMPS

Chen-Yang Pan, Kwang-Ting Cheng, Univ. of California, Santa Barbara, CA
Sandeep Gupta, Univ. of Southern California, Los Angeles, CA

In this paper, a comprehensive macromodel for transistor level faults in an operational amplifier is developed. With the observation that faulty behavior at output may result from interfacing error in addition to the faulty component, parameters associated with input and output characteristics are incorporated. Test generation and fault classification are addressed for stand-alone opamps. A high fault coverage is achieved by a proposed testing strategy. Transistor level short/bridging faults are analyzed and classified into catastrophic faults and parametric faults. Based on the macro-models for parametric faults, fault simulation is performed for an active filter. We found many parametric faults in the active filter cannot be detected by traditional functional testing. A DFT scheme along with a current testing strategy to improve fault coverage is proposed.


SESSION 5D ROUTING FOR FPGAS

Moderators: Bryan Preas, Xerox Parc, Palo Alto, CA
Dwight D. Hill, Synopsys, Inc., Mountain View, CA

5D.1 A Channel-Driven Global Routing with Consistent Placement

Shigetoshi Nakatake, Yoji Kajitani, JAIST, Ishikawa, Japan

A global router with its consistent placer is proposed which aims to control wire-densities of channels. The routing order of nets and their routes are decided according to the channel which is predicted to have the maximum wire-density. The placer distributes the nets evenly with respect to the virtual length (half perimeter of the bounding box). Interesting features included are the interactive dynamic test to decide the form of predicting functions and the admissible region to consider the routing resources in placement stage. Experiments reveal some interesting phenomena that smaller maximum wire-density is attained in spite of comparable total wire-density and that smaller maximum wire-length in spite of larger total wire-length.

5D.2 A New Global Routing Algorithm for FPGAs

Yao-Wen Chang, S. Thakur, K. Zhu, D.F. Wong, Univ. of Texas, Austin, TX

As in traditional ASIC technologies, FPGA routing usually consists of two steps: global routing and detailed routing. Unlike existing FPGA detailed routers, which can take full advantage of the special structures of the programmable routing resources, FPGA global routing algorithms still greatly resemble their counter-parts in the traditional ASIC technologies. In particular, the routing congestion information of a switch block essentially is still measured by the numbers of block essentially is still measured by the numbers of available rows and columns in the switch block. Since the internal architecture of a switch block decides what can route through the block, the traditional measure of routing capacity is no longer accurate. In this paper, we present an accurate measure of switch block routing capacity. Our new measure considers the exact positions of the switches inside a switch block. Experiments with a global router based on these ideas show an average improvement of 38% in the channel width required to route some benchmark circuits using a popular switch block, compared with an algorithm based on the traditional methods for congestion control.

5D.3 On NP-Completeness of Regular 2-D FPGA Routing Architectures and a Novel Solution

Yu-Liang Wu, Douglas Chang, Univ. of California, Santa Barbara, CA

Several industrial FPGA routing architectures have been shown to have no efficient routing algorithms (unless P=NP) [3,4]. Here, we further investigate if the intractability of the routing problem on a regular 2-D FPGA routing architecture can be alleviated by adding routing switches. We show that on this routing architecture, even with a substantial increase in switching flexibility, a polynomial time, predictable routing algorithm is still not likely to exist, and there is no constant ratio bound of the detailed over global routing channel densities.

We also show that a perfect routing is unachievable on this architecture even with near complete (maximum) switching flexibility. We also discuss a new, greedy routing architecture, that possesses predictable and other desired routing properties, yet requires fewer routing resources than regular architectures. This theoretical result may suggest an alternative approach in routing architecture designs.


SESSION 6A OPTIMIZATION FOR LOW POWER

Moderators: Michel Berkelaar, IBM Corp., Yorktown Heights, NY
Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada

6A.1 A Symbolic Method to Reduce Power Consumption of Circuits Containing False Paths

R. Iris Bahar, Gary D. Hachtel, Enrico Macii, Fabio Somenzi, Univ. of Colorado, Boulder, CO

Power dissipation in technology mapped circuits can be reduced by performing gate re-sizing. Recently we have proposed a symbolic procedure which exploits the compactness of the ADD data structure to accurately calculate the arrival times at each node of a circuit for any primary input vector. In this paper we extend our timing analysis tool to the symbolic calculation of required times and slacks, and we use this information to identify gates of the circuit that can be re-sized. The nice feature of our approach is that it takes into account the presence of false paths naturally. As shown by the experimental results, circuits re-synthesized with the technique we present in this paper are guaranteed to be at least as fast as the original implementations, but smaller and substantially less power-consuming.

6A.2 Multi-Level Network Optimization Targeting Low Power

Sasan Iman, Massoud Pedram , Univ. of Southern California, Los Angeles, CA

This paper describes a procedure for minimizing the power consumption in a boolean network under the zero delay model. Power is minimized by modifying the function of each intermediate node in the network such that the power consumption of the node is decreased without increasing the power consumption of the other nodes in the network. A formal analysis of how changes in the switching activity of an intermediate node affect the switching activity of other nodes in the network is given first. Using this analysis, a procedure for calculating the set of compatible power don'’t cares for each node in the network is presented. Finally it is shown how these don'’t cares are used to optimize the network for low power. These techniques have been implemented and results show an average of 10% improvement in total power consumption of the network compared to the results generated by the conventional network optimization techniques.

6A.3 LP Based Cell Selection with Constraints of Timing, Area and Power Consumption

Yutaka Tamiya, Yusuke Matsunaga, Fujitsu Labs. Ltd., Kawasaki, Japan
Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA

This paper presents a new LP based optimal cell selection method. Optimal cell selection is useful tool for final tuning of LSI designs. It replaces drivabilities of cells, adjusting timing, area, and power constraints. Using the latest and earliest arrival times, it can handle both setup and hold time constraints. We also make an efficient initial basis, which speeds up a simplex LP solver by 5 times without any relaxations nor approximations. From experimental results, it reduces the clock cycle of a manual designed 13k-transistor chip by 17% without any increase of area.


SESSION 6B EMBEDDED SOFTWARE

Moderators: Gaetano Boriello, Univ. of Washington, Seattle, WA
Wayne Wolf, Princeton Univ., Princeton, NJ

6B.1 Power Analysis of Embedded Software: A First Step Towards Software Power Minimization

Vivek Tiwari, Sharad Malik, Andrew Wolfe, Princeton Univ., Princeton, NJ

Embedded computer systems are characterized by the presence of a dedicated processor and the software that runs on it. Power constraints are increasingly becoming the critical component of the design specification of these systems. At present, however, power analysis tools can only be applied at the lower levels of the design - the circuit or gate level. It is either impractical or impossible to use the lower level tools to estimate the power cost of the software component of the system. This paper describes the first systematic attempt to model this power cost. A power analysis technique is developed that has been applied to two commercial microprocessors - Intel 486DX2 and Fujitsu SPARClite 934. This technique can be employed to evaluate the power cost of embedded software and also be used to search the design space in software power optimization.

6B.2 Generating Instruction Sets and Microarchitectures from Applications

Ing-Jer Huang, Alvin Despain, Univ. of Southern California, Los Angeles, CA

The design of application-specific instruction set processor (ASIP) system includes at least three interdependent tasks: microarchitecture design, instruction set design, and instruction set mapping for the application. We present a method that unifies these three design problems with a single formulation: a modified scheduling/allocation problem with an integrated instruction formation process. Micro-operations (MOPs) representing the application are scheduled into time steps. Instructions are formed and hardware resources are allocated during the scheduling process. The assembly code for the given application is obtained automatically at the end of the scheduling process. This approach considers MOP parallelism, instruction field encoding, delay load/store/ branch, conditional execution of MOPs and the retargetability to various architecture templates. Experiments are presented to show the power and limitation of our approach. Performance improvement over our previous work [4] is significant.

6B.3 Register Assignment Through Resource Classification for ASIP Microcode Generation

Clifford Liem, Trevor May, Pierre Paulin, Bell-Northern Research Inc., Ottawa, Ontario, Canada

Application Specific Instruction-Set Processors (ASIPs) offer designers the ability for high-speed data and control processing with the added flexibility needed for late design specifications, accomodation of design errors, and product evolution. However , code generation for ASIPs is a complex problem and new techniques are needed for its success. The register assignment task can be a critical phase, since often in ASIPs, the number and functionality of available registers is limited, as the designer has opted for simplicity, speed, and low area. Intelligent use of register files is critical to the program execution time, program memory usage and data memory usage. This paper describes a methodology utilizing register classes as a basis for assignment for a particular style of ASIP architectures. The approach gives preference to special purpose registers which are the scarce resources. This naturally leads to the objectives of high speed and low program memory usage. The approach has been implemented in a system called CodeSyn [1] and used on custom ASIP architectures.


SESSION 6C PADE BASED CIRCUIT AND INTERCONNECT ANALYSIS

Moderators: Albert Ruehli, IBM Corp., Yorktown Heights, NY
Sung Mo Kang, Univ. of Illinois, Urbana, IL

6C.1 Efficient Small-Signal Circuit Analysis and Sensitivity Computations with the PVL Algorithm

Roland W. Freund, Peter Feldmann, AT&T Bell Labs., Murray Hill, NJ

We describe the application of the PVL algorithm to the small-signal analysis of circuits, including sensitivity computations. The PVL algorithm is based on the efficient computation of the Pade approximation of the network transfer function via the Lanczos process. The numerical stability of the algorithm permits the accurate computation of the Pade approximation over any given frequency range. We extend the algorithm to compute sensitivities of network transfer functions, their poles, and zeros, with respect to arbitrary circuit parameters, with minimal additional computational cost, and we present numerical examples.

6C.2 Capturing Time-Of-Flight Delay for Transient Analysis Based on Scattering Parameter Macromodel

Haifang Liao, Wayne W.M. Dai, Univ. of California, Santa Cruz, CA

The delay associated with transmission line networks consists of the exponentially charging time and a pure propagation delay. This propagation delay , so called time-of-flight delay, is particularly evident in long lines. When the time-of-flight is comparable to the input rise-time, it is difficult to capture the time-of-flight with a finite sum of exponentials. Therefore the time-of-flight must be captured explicitly from the transfer function of the circuit. In this paper, we give a precise definition of the time-of-flight together with some basic properties, and present an efficient method to capture the time-of-flight for general interconnect networks. Based on our scattering parameter macromodel, we can easily capture the time-of-flight during the network reduction while using lower order model to evaluate the charging delay. By capturing the time-of-flight delay, the accuracy of system responses can be greatly improved without significantly increasing computing time.

6C.3 RC Interconnect Synthesis - A Moment Fitting Approach

Noel Menezes, Satyamurthy Pullela, Florentin Dartu, Lawrence T. Pillage, Univ. of Texas, Austin, TX

Presently, delays due to the physical interconnect between logic gates account for large portions of the overall path delays. For this reason, synthesis of the logic gate fanout structure is of paramount importance during performance optimization. This paper presents a methodology for on-chip RC interconnect synthesis. Moment sensitivities are used to vary the wire widths of the branches in an RC interconnect tree to achieve performance targets. In this paper, signal slopes and delays at critical fanout nodes are the targets, and the impact on total metal area is considered. An procedure for computing the exact moment sensitivities in an RC tree is described.


SESSION 6D FLOORPLANNING

Moderators: Takashi Kambe, SHARP Co., Nara, Japan
Ralph Otten, Delft Univ. of Tech., Delft, The Netherlands

6D.1 Adaptive Cut Line Selection in Min-Cut Placement for Large Scale Sea-of-Gates Arrays

Kazuhiro Takahashi, Mitsubishi Electric Corp., Itami Hyogo, Japan, Kazuo Nakajima, Univ. of Maryland, College Park, MD
Masayuki Terai, Koji Sato, Mitsubishi Electric Corp., Itami Hyogo, Japan

We present a new min-cut based placement algorithm for large scale sea-of-gates arrays. In the past all such algorithms used a fixed cut line sequence that is determined before min-cut partitioning is performed. In our approach, we adaptively select a next partitioning pattern based on the current parameter value; we then perform the corresponding min-cut partitionings and measure a new parameter value. We repeat this process until all cut lines are processed. As a parameter, we introduce a new global objective function based on wire congestions on cut lines. We establish a close relation between this function and cut line sequences. This relation is used to develop an innovative method of adaptively determining a cut line sequence so as to minimize this global function. With this adaptive selection of cut lines along with a new cluster-based min-cut partitioning technique, our algorithm can produce, in a short time and at a low cost, final placement results that achieve the 100% completion of wiring on chips of fixed sizes. This has led to its successful production use, having generated more than 400 CMOS sea-of-gates array chips.

6D.2 Folding a Stack of Equal Width Components

Venkat Thanvantri, Sartaj Sahni, Univ. of Florida, Gainesville, FL

We consider two versions of the problem of folding a stack of equal width components. In both versions, when a stack is folded, a routing penalty is incurred at the fold. In one version, the height of the folded layout is given and we are to minimize width. In the other, the width of the folded layout is given and its height is to be minimized.

6D.3 Area Minimization for Hierarchical Floorplans

Peichen Pan, Univ. of Ilinois, Urbana, IL, Weiping Shi, Univ. of North Texas, Denton, TX, C.L. Liu, Univ. of Illinois, Urbana, IL

Two results are presented in this paper. First we settle the open problem on the complexity of the area minimization problem for hierarchical floorplans by showing it to be NP-complete. We then present a pseudo-polynomial area minimization algorithm for hierarchical floorplans of order-5. The algorithm is based on a new algorithm for determining the set of nonredundant realizations of a wheel. The new algorithm for wheels has time cost O(k^2 log k) and space cost O(k^2 ) if each of the (five) blocks in a wheel has at most k realizations - a reduction by a factor of k in both costs in comparison with previous algorithms. The area minimization algorithm was implemented. Our experimental results show that the algorithm is indeed very fast.


SESSION 7A FORMAL VERIFICATION

Moderators: David Dill, Stanford Univ., Stanford, CA
Olivier Coudert, Digital Equipment Corp., Rueil-Malmaison, France

7A.1 Multi-Level Synthesis for Safe Replaceability

Carl Pixley, Motorola, Inc., Austin. TX, Vigyan Singhai, Adnan Aziz, Robert K. Brayton, Univ. of California, Berkeley, CA

We describe the condition that a sequential digital design is a safe replacement for an existing design without making any assumptions about a known initial state of the design or about its environment. We formulate a safe replacement condition which guarantees that if an original design is replaced by a new design, the interacting environment cannot detect the change by observing the input-output behavior of the new design; conversely, if a replacement design does not satisfy our condition an environment can potentially detect the replacement (in this sense the replacement is potentially unsafe). Our condition allows simplification of the state transition diagram of an original design. We use the safe replacement condition to derive a sequential resynthesis method for area reduction of gate-level designs. We have implemented our resynthesis algorithm and we report experimental results.

7A.2 Iterative Algorithms for Formal Verification of Embedded Real-Time Systems

Felice Balarin, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA

Most embedded real-time systems consists of many concurrent components operating at significantly different speeds. Thus, an algorithm for formal verification of such systems must efficiently deal with a large number of states and large ratios of timing constants. We present such an algorithm based on timed automata, a model where a finite state system is augmented with time measuring devices called timers. We also present a semi-decision procedure for an extended model where timers can be decremented. This extension allows describing behaviors that are not expressible by timed automata, for example interrupts in a real-time operating system.

7A.3 Incremental Formal Design Verification

Gitanjali M. Swamy, Robert K. Brayton, Univ. of California, Berkeley, CA

Language containment is a method for design verification that involves checking if the behavior of the system to be verified is a subset of the behavior of the specifications (properties or requirements), which it has to meet. If this check fails, language containment returns a subset of `fair' states involved in behavior that the system exhibits but the specification does not. Current techniques for language containment do not take advantage of the fact that the process of design is incremental; namely that the designer repeatedly modifies and re-verifies his/her design. This results in unnecessary and cumbersome computation. We present a method, which successively modifies the latest result of verification each time the design is modified. Our incremental algorithm translates changes made by the designer to an addition or subtraction of edges, states or constraints (on acceptable behavior) from the transition behavior or specification of the problem. Next, these changes are used to update the set of `fair' states previously computed. This incremental algorithm is superior to the current techniques for language containment; a conclusion supported by the experimental results presented in this paper.


SESSION 7B TIMING OPTIMIZATION BY GATE SIZING

Moderators: Patrick McGeer, Cadence Berkeley Labs., Berkeley, CA
Hidetoshi Onodera, Kyoto Univ., Kyoto, Japan

7B.1 Optimization of Critical Paths in Circuits with Level-Sensitive Latches

Timothy M. Burks, IBM Corp., Austin, TX, Karem Sakallah, Univ. of Michigan, Ann Arbor, MI

A simple extension of the critical path method is presented which allows more accurate optimization of circuits with level-sensitive latches. The extended formulation provides a sufficient set of constraints to ensure that, when all slacks are non-negative, the corresponding circuit will be free of late signal timing problems. Cycle stealing is directly permitted by the formulation. However, moderate restrictions may be necessary to ensure that the timing constraint graph is acyclic. Forcing the constraint graph to be acyclic allows a broad range of existing optimization algorithms to be easily extended to better optimize circuits with level-sensitive latches. We describe the extension of two such algorithms, both of which attempt to solve the problem of selecting parts from a library to minimize area subject to a cycle time constraint.

7B.2 Computing the Entire Active Area/Power Consumption Versus Delay Trade-Off Curve for Gate Sizing with a Piecewise Linear Simulator

Michel Berkelaar, IBM Corp., Yorktown Heights, NY, Pim H.W. Buurman, Jochen A.G. Jess, Eindhoven Univ. of Tech., Eindhoven, The Netherlands

The gate sizing problem is the problem of finding load drive capabilities for all gates in a given Boolean network such, that a given delay limit is kept, and the necessary cost in terms of active area usage and/or power consumption is minimal. This paper describes a way to obtain the entire cost versus delay trade-–off curve of a combinational logic circuit in an efficient way. Every point on the resulting curve is the global optimum of the corresponding gate sizing problem. The problem is solved by mapping it onto piecewise linear models in such a way, that a piecewise linear (circuit) simulator can do the job. It is shown that this setup is very efficient, and can produce trade-–off curves for large circuits (thousands of gates) in a few minutes. Benchmark results for the entire set of MCNC '91 two-–level examples are given.

7B.3 Dynamical Identification of Critical Paths for Iterative Gate Sizing

How-Rern Lin, Ting-Ting Hwang, Tsing Hua Univ., Hsin-chu, Taiwan, ROC

Since only sensitizable paths contribute to the delay of a circuit, false paths must be excluded in optimizing the delay of the circuit. Just identifying false paths in the first place is not sufficient since during iterative optimization process, false paths may become sensitizable, and sensitizable paths false. In this paper, we examine cases for false path becoming sensitizable and sensitizable becoming false. Based on these conditions, we adopt a so-called loose sensitization criterion [ChD91] which is used to develop an algorithm for dynamically identification of sensitizable paths. By combining gate sizing and dynamically identification of sensitizable paths, an efficient performance optimization tool is developed. Results on a set of circuits from ISCAS benchmark set demonstrate that our tool is indeed very effective in reducing circuit delay with less number of gate sized as compared with other methods.


SESSION 7C ANALOG and MIXED-SIGNAL DFT

Moderators: Mani Soma, Univ. of Washington, Seattle, WA
Abhijit Chatterjee, Georgia Inst. of Tech., Atlanta, GA

7C.1 Built-In Self-Test and Fault Diagnosis of Fully Differential Analogue Circuits

Salvador Mir, Vladimir Kolarik, Marcelo Lubaszewski, Christian Nielsen, Bernard Courtois, TIMA/INPG, Grenoble, France

An approach to the test and diagnosis of fully differential analogue circuits is described in this paper. The test approach is based on off-line monitoring via an analogue BIST observer the inputs of the operational amplifiers in the circuit. The analogue BIST can detect both hard and soft faults. Diagnosis resolution is improved by also monitoring the outputs of the operational amplifiers. Faulty components can then be located and the actual defective value of a faulty passive component determined.

7C.2 A New Built-In-Self-Test Approach for Digital-To-Analog and Analog-To-Digital Converters

Karim Arabi, Bozena Kaminska, Janusz Rseszut, École Polytechnique de Montréal, Montréal, Quebec, Canada

This paper proposes a test approach and circuitry suitable for built-in self-test (BIST) of digital-to-analog (D/A) and analog-to-digital (A/D) converters. Offset, gain, linearity and differential linearity errors are tested without using test equipment. The proposed BIST structure decreases the test cost and test time. The BIST circuitry has been designed to D/A and A/D converters using CMOS 1.2 µm technology. By only a minor modification the test structure would be able to localize the fail situation. The small value of area overhead (AOH), the simplicity and efficiency of the proposed BIST architecture seem to be promising for manufacturing.

7C.3 Fault Detection and Input Stimulus Determination for the Testing of Analog Integrated Circuits Based on Power-Supply Current Monitoring

Georges Gielen, Zhihua Wang, W. Sansen, Katholieke Univ. Leuven, Heverlee, Belgium

A new method for the testing and fault detection of analog integrated circuits is presented. Time-domain testing followed by spectral analysis of the power-supply current is used to detect both DC and AC faults. Spectral analysis is applied since the tolerances on the circuit parameters make a direct comparison of waveforms impossible. For the fault detection a probabilistic decision rule is proposed based on a multivariate statistical analysis. Since no extra testing pin is needed and the on-line calculation effort is small, the method can be used for wafer-probe testing as well as final production testing. In addition, a methodology for the selection of the input stimulus is presented that improves the testability. Examples demonstrate the efficiency and the effectiveness of the algorithms.


SESSION 7D MODELING THE DESIGN PROCESS

Moderators: Jay Brockman, Univ. of Notre Dame, Notre Dame, IN
Michaela Guiney, Hewlett Packard, Palo Alto, CA

7D.1 An Enhanced Flow Model for Constraint Handling in Hierarchical Multi-View Design Environments

Pieter van der Wolf, Olav ten Bosch, Alfred van der Hoeven, Delft Univ. of Tech., Delft, The Netherlands

In this paper we present an enhanced design flow model that increases the capabilities of a CAD framework to support design activities on hierarchical multi-view design descriptions. This flow model offers new constructs for the configuration of complex design constraints in terms of conditions on the hierarchical multi-view structure of a design. The design flow management system enforces these constraints and uses them to inform the designer more effectively about the validity of verification results and the executability of tools. This helps to make the design process less error prone and to improve productivity. Our solution is original in that we introduce the notions of design hierarchy and equivalence in a design flow model. We thereby bridge a gap between the areas of data management and design flow management. Strong points of our solution are its simplicity and the seamless integration with existing flow management concepts.

7D.2 On Modeling Top-Down VLSI Design

Bernd Schürmann, Joachim Altmeyer, Martin Schütze, Univ. of Kaiserslautern, Kaiserslautern, Germany

We present an improved data model that reflects the whole VLSI design process including bottom-up and top-down design phases. The kernel of the model is a static version concept that describes the convergence of a design. The design history which makes the semantics of most other version concepts, is modeled explicitly by additional object classes (entities types) but not by the version graph itself. Top-down steps are modeled by splitting a design object into requirements and realizations. The composition hierarchy is expressed by a simple but powerful configuration method. Design data of iterative refinement processes are managed efficiently by storing incremental data only.

7D.3 A Formal Basis for Design Process Planning and Management

Margarida F. Jacome, Univ. of Texas, Austin, TX, Stephen W. Director, Carnigie Mellon Univ., Pittsburgh, PA

In this paper we present a design formalism that allows for a complete and general characterization of design disciplines and for a unified representation of arbitrarily complex design processes. This formalism has been used as the basis for the development of several prototype CAD meta-tools that offer effective design process planning and management services.


SESSION 8A EMBEDDED TUTORIAL

Moderator: Hugo De Man, IMEC, Leuven, Belgium

8A.1 Design of Heterogeneous IC's for Mobile and Personal Communication Systems

Gert Goossens, Ivo Bolsens, Bill Lin, Francky Catthoor, IMEC, Leuven, Belgium

Mobile and personal communication systems form key market areas for the electronics industry of the nineties. Stringent requirements in terms of flexibility, performance and power dissipation, are driving the development of integrated circuits into the direction of heterogeneous single-chip solutions. New IC architectures are emerging which contain the core of a powerful programmable processor, complemented with dedicated hardware, memory and interface structures. In this tutorial we will discuss the real-life design of a heterogeneous IC for an industrial telecom application : a reconfigurable mobile terminal for satellite communication. Based on this practical design experience, we will subsequently discuss a methodology for the design of heterogeneous ICs. Design steps that will be addressed include : system specification and refinement, data path and communication synthesis, and code generation for embedded processor cores.


SESSION 8B EMBEDDED TUTORIAL

Moderators: Massoud Pedram, Univ. of Southern California, Los Angeles, CA

8B.1 Embedded Systems Design for Low Energy Consumption

Michael Schuette, John R. Barr, Motorola, Inc., Rolling Meadows, IL

This tutorial covers the circuit fundamentals of CMOS circuits which contribute to the consumption of energy in portable products, as well as guidelines for the design of systems in order to reduce energy consumption and prolong battery life. Circuit fundamentals will include a definition of terms, basic circuit elements, laws of operation, and basic circuit theory applying energy consumption. We will then present three major principles of energy reduction: reducing number of transitions, reducing the amount of switched capacitance, and reducing the operating voltage. Several guidelines that can be applied during the system design process which utilize the three major principles.


SESSION 9A SYNTHESIS OF ASYNCHRONOUS CIRCUITS

Moderators: Luciano Lavagno - Politecnico di Torino, Torino, Italy
David Dill - Stanford Univ., Stanford, CA

9A.1 Synthesis of Hazard-Free Multi-Level Logic Implementations Under Multiple-Input Changes From Binary Decision Diagrams

Bill Lin, IMEC Lab., Leuven, Belgium, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA

We describe a new method for directly synthesizing a hazard-free multilevel logic implementation from a given logic specification. The method is based on free/ordered Binary Decision Diagrams (BDD's), and is naturally applicable to multiple-output logic functions. Given an incompletely-specified (multiple-output) Boolean function, the method produces a multilevel logic network that is hazard-free for a specified set of multiple-input changes. We assume an arbitrary (unbounded) gate and wire delay model under a pure delay (PD) assumption, we permit multiple-input changes, and we consider both static and dynamic hazards. This problem is generally regarded as a difficult problem and it has important applications in the field of asynchronous design. The method has been automated and applied to a number of examples. The results we have obtained are very promising.

9A. 2 Performance-Driven Synthesis of Asynchronous Controllers

Kenneth Y. Yun, Univ. of California at San Diego, La Jolla, CA, David Dill, Stanford Univ., Stanford, CA
Bill Lin, IMEC Lab., Leuven, Belgium, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA

We examine the implications of a new hazard-free combinational logic synthesis method [8], which generates multiplexor trees from binary decision diagrams (BDDs) - representations of logic functions factored recursively with respect to input variables - on extended burst-mode asynchronous synthesis. First, the use of the BDD-based synthesis reduces the constraints on state minimization and assignment, which reduces the number of additional state variables required in many cases. Second, in cases where conditional signals are sampled, it eliminates the need for state variable changes preceding output changes, which reduces overall input to output latency. Third, selection variables can easily be ordered to minimize the latency on a user-specified path, which is important for optimizing the performance of systems that use asynchronous components. We present extensive evaluations showing that, with only minimal optimization, the BDD-based synthesis gives comparable results in area with our previous exact two-level synthesis method. We also give a detailed example of the specified path optimization.

9A. 3 Decomposition Methods for Library Binding of Speed-Independent Asynchronous Designs

Polly Siegel, Giovanni De Micheli, Stanford Univ., Stanford, CA

We describe methods for decomposing gates within a speed-independent asynchronous design. The decomposition step is an essential part of the library binding process, and is used both to increase the granularity of the design for higher quality mapping and to ensure that the design can be implemented. We present algorithms for simple hazard-free gate decomposition, and show results which indicate that we can decompose most of the gates in our benchmark set by this simple method. We then extend these algorithms to work for those cases in which no simple decomposition exists.


SESSION 9B DIAGNOSIS AND VERIFICATION

Moderators: Theodor Vierhaus, GMD, Sankt Augustin, Germany
Kwang-Ting Cheng, Univ. of California, Santa Barbara, CA

9B.1 On Error Correction in Macro-Based Circuits

Irith Pomeranz, Sudhakar Reddy, Univ. of Iowa, Iowa City, IA

We consider the problem of correcting errors in a macro-based circuit. Our formulation of the problem allows the correction of errors that arise both in the context of design error correction, before the circuit is realized, and in the context where a physical circuit needs to be corrected. Two error classes are defined, namely, component errors and line errors. Both single and multiple errors are considered. Accurate correction procedures are given for single errors. Heuristics are given for correcting multiple errors. Experimental results are given to demonstrate the correction procedures presented.

9B.2 Fault Dictionary Compaction by the Elimination of Output Sequences

Vamsi Boppana, W. Kent Fuchs, Univ. of Illinois, Urbana, IL

Fault dictionary compaction has been accomplished in the past by removing responses on individual output pins for specific test vectors. In contrast to the previous work, we present techniques for eliminating entire sequences of outputs and for efficiently storing the remaining output sequences. Experimental results on the ISCAS 85 and ISCAS 89 benchmark circuits show that the sizes of dictionaries proposed are substantially smaller than the full fault dictionary, while the dictionaries retain most or all of the diagnostic capability of the full fault dictionary.

9B.3 Automatic Test Program Generation for Pipelined Processors

Hiroaki Iwashita, Satoshi Kowatari, Tsuneo Nakata, Fumiyasu Hirose, Fujitsu Labs. Ltd., Kawasaki, Japan

Simulation-based verification has both advantages and disadvantages compared with formal verification. Our demand is to find a practical way to verify actual microprocessors. This paper presents an efficient test program generation method for simulation-based verification using techniques developed for formal verification. Our test program generator enumerates all reachable states of a processor pipeline and generates instruction sequences for every reachable test case. The program covers complicated test cases that are difficult to cover with random instructions and impossible to cover with conventional test program generation methods. Our test program generator also works for larger microprocessor designs than formal verifiers have done.


SESSION 9C ANALOG CIRCUIT SYNTHESIS AND VERIFICATION

Moderators: Peter Feldmann, AT&T Bell Labs., Murray Hill, NJ
Kurt J. Antreich, Technical Univ. of Munich, Munich, Germany

9C.1 Synthesis of Manufacturable Analog Circuits

Tamal Mukherjee, L. Richard Carley, Rob A. Rutenbar-Carnegie Mellon Univ., Pittsburg, PA

We describe a synthesis system that takes operating range constraints and inter- and intra- circuit parametric manufacturing variations into account while designing a sized and biased analog circuit. Previous approaches to CAD for analog circuit synthesis have concentrated on nominal analog circuit design, and subsequent optimization of these circuits for statistical fluctuations and operating point ranges. Our approach simultaneously synthesizes and optimizes for operating and manufacturing variations by mapping the circuit design problem into an Infinite Programming problem and solving it using an annealing within annealing formulation. We present circuits designed by this integrated synthesis system, and show that they indeed meet their operating range and parametric manufacturing constraints.

9C.2 A Statistical Optimization-Based Approach for Automated Sizing of Analog Cells

F. Medeiro, F.V. Fernández, R. Domínguez-Castro, A. Rodríguez-Vázguez, Centro Nacional de Microelectrónica, Sevilla, Spain

This paper presents a CAD tool for automated sizing of analog cells using statistical optimization in a simulation based approach. A nonlinear penalty-like approach is proposed to define a cost function from the performance specifications. Also, a group of heuristics is proposed to increase the probability of reaching the global minimum as well as to reduce CPU time during the optimization process. The proposed tool sizes complex analog cells starting from scratch, within reasonable CPU times (approximately 1hour for a fully differential opamp with 51 transistors), requiring no designer interaction, and using accurate transistor models to support the design choices. Tool operation and feasibility is demonstrated via experimental measurements from a working CMOS prototype of a folded-cascode amplifier.

9C.3 Time-Domain Non-Monte Carlo Noise Simulation for Nonlinear Dynamic Circuits with Arbitrary Excitations

Alper Demir, Edware Liu, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA

A new, time-domain, non-Monte Carlo method for computer simulation of electrical noise in nonlinear dynamic circuits with arbitrary excitations is presented. This time-domain noise simulation method is based on the results from the theory of stochastic differential equations. The noise simulation method is general in the sense that any nonlinear dynamic circuit with any kind of excitation, which can be simulated by the transient analysis routine in a circuit simulator, can be simulated by our noise simulator in time-domain to produce the noise variances and covariances of circuit variables as a function of time, provided that noise models for the devices in the circuit are available. Noise correlations between circuit variables at different time points can also be calculated. Previous work on computer simulation of noise in integrated circuits is reviewed with comparisons to our method. Shot, thermal and flicker noise models for integrated-circuit devices, in the context of our time-domain noise simulation method, are described. The implementation of this noise simulation method in a circuit simulator (SPICE) is described. Two examples of noise simulation (a CMOS ring-oscillator and a BJT active mixer) are given.


SESSION 9D EFFICIENT ROUTING ALGORITHMS

Moderators: Majid Sarrafzadeh, Northwestern Univ., Evanston, IL
Yoji Kajitani, JAIST, Ishikawa, Japan

9D.1 Improving Over-The-Cell Channel Routing in Standard Cell Design

Xiaolin Liu, Ioannis G. Tollis, Univ. of Texas, Richardson, TX

The first stage of over-the-cell routing in the horizontally connected vertically connected (HCVC) model is formulated as follows: Given two rows of terminals, find a planar routing to connect a subset of nets (with weights) on each row of terminals using a fixed number of tracks to maximize the total weight. This problem is called the two row fixed height planar routing (TFPR) problem [CPL93]. The complexity of the TFPR problem was unknown up to now. An approximation algorithm for the TFPR problem was presented in [CPL93]. In this paper we present a O(n^2 * h^2) time algorithm to solve the TFPR problem optimally, where n is the number of terminals and h is the height of the standard cells. Our algorithm can be used to improve the performance of several over-the-cell channel routers including the ones in [CPL93] and [HSS93].

9D.2 Minimum Crosstalk Switchbox Routing

Tong Gao, C.L. Liu, Univ. of Illinois, Urbana, IL

As technology advances, interconnection wires are placed in closer proximity. Consequently, reduction of crosstalks between interconnection wires becomes an important consideration in VLSI design. In this paper, we study the gridded switchbox routing problems with the objectives of satisfying crosstalk constraints and minimizing the total crosstalk in the nets. We propose a new approach to the problems which utilizes existing switchbox routing algorithms and improves upon the routing results by re-assigning the horizontal and vertical wire segments to rows and columns, respectively, in an iterative fashion. This approach can also be applied to the channel routing problem with crosstalk constraints. A novel mixed ILP formulation and effective procedures for reducing the number of variables and constraints in the mixed ILP formulation are then presented. The experimental results are encouraging.

9D.3 Techniques for Crosstalk Avoidance in the Physical Design of High-Performance Digital Systems

Desmond Kirkpatrick, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA

Interconnect performance does not scale well into deep submicron dimensions, and the rising number of analog effects erodes the digital abstraction necessary for high levels of integration. In particular, crosstalk is an analog phenomenon of increasing relevance. To cope with the increasingly analog nature of high-performance digital system design, we propose using a constraint-driven methodology. In this paper we describe new constraint generation ideas incorporating digital sensitivity. In constraint-driven synthesis, we show that a fundamental subproblem of crosstalk channel routing, coupling-constrained graph levelization(CCL), is NP-complete, and develop a novel heuristic algorithm. To demonstrate the viability of our methodology, we introduce a gridless crosstalk-avoiding channel router as an example of a robust and truly constraint-driven synthesis tool.


SESSION 10A BDDs and Applications

Moderators: Herve Touati, Digital Equipment Corp., Rueil-Malmaison, France
Albert Wang, Synopsys, Inc., Mountain View, CA

10A.1 Efficient Breadth-First Manipulation of Binary Decision Diagrams

Pranav Ashar, Matthew Cheong, NEC USA, Princeton, NJ

We propose new techniques for efficient breadth-first iterative manipulation of ROBDDs. Breadth-first iterative ROBDD manipulation can potentially reduce the total elapsed time by multiple orders of magnitude compared to the conventional depth-first recursive algorithms when the memory requirement exceeds the available physical memory. However, the breadth-first manipulation algorithms proposed so far [5] have had a large enough overhead associated with them to make them impractical. Our techniques are geared towards minimizing the overhead without sacrificing the speed up potential. Experimental results indicate considerable success in that regard.

10A.2 Symmetry Detection and Dynamic Variable Ordering of Decision Diagrams

Shipra Panda, Fabio Somenzi, Univ. of Colorado, Boulder, CO, Bernard F. Plessier, Motorola, Inc., Austin. TX

Knowing that some variables are symmetric in a function has numerous applications; in particular, it can help produce better variable orders for Binary Decision Diagrams (BDDs) and related data structures (e.g., Algebraic Decision Diagrams). It has been conjectured that there always exists an optimum order for a BDD wherein symmetric variables are contiguous. We propose a new algorithm for the detection of symmetries, based on dynamic reordering, and we study its interaction with the reordering algorithm itself. We show that combining sifting with an efficient symmetry check for contiguous variables results in the fastest symmetry detection algorithm reported to date and produces better variable orders for many BDDs. The overhead on the sifting algorithm is negligible.

10A.3 A Redesign Technique for Combinational Circuits Based on Reconnections of Gates

Yuji Kukimoto, Univ. of California, Berkeley, CA, Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA
Robert K. Brayton, Univ. of California, Berkeley, CA

In this paper, we consider a redesign technique applicable to combinational circuits implemented with gate-array or standard-cell technology, where we rectify an existing circuit only by reconnecting gates on the circuit with all the gate types unchanged. This constraint allows us to reuse the original placement as is, thereby speeding up the total time needed for a redesign. We formulate this problem as a Boolean-constraint problem and give a BDD-based algorithm to check the feasibility of redesign.


SESSION 10B DESIGN FOR TEST AND CONCURRENT ERROR DETECTION

Moderators: Sudhir Kadkade, Mentor Graphics Corp., Wilsonville, OR
Yervant Zorian, AT&T Bell Labs., Princeton, NJ

10B.1 Non-Scan Design-For-Testability of RT-Level Data Paths

Sujit Dey, Miodrag Potkonjak, C&C Research Labs., NEC USA, INC., Princeton, NJ

This paper presents a non-scan design-for-testability technique applicable to register-transfer(RT) level data path circuits, which are usually very hard-to-test due to the presence of complex loop structures. We develop a new testability measure, and utilize the RT-level structure of the data path, for cost-effective re-design of the circuit to make it easily testable, without having to either scan any flip-flop or break loops directly. The non-scan DFT technique was applied to several data path circuits. Experimental results demonstrate the feasibility of producing non-scan testable data paths, which can be tested at-speed. The hardware overhead and the test application time required for the non-scan designs is significantly lower than the corresponding partial scan designs.

10B.2 Selecting Partial Scan Flip-Flops for Circuit Partitioning

Toshinobu Ono, NEC Corp., Kawasaki, Japan

This paper presents a new method of selecting scan flip-flops (FFs) in partial scan designs of sequential circuits. Scan FFs are chosen so that the whole circuit can be partitioned into many small subcircuits which can be dealt with separately by a test pattern generator. This permits easy automatic test pattern generation for arbitrarily large sequential circuits. Algorithms of selecting scan FFs to allow such partitioning and of scheduling tests for subcircuits are given. Experimental results show that the proposed method makes it possible to generate test patterns for extra large sequential circuits which previous approaches cannot deal with.

10B.3 Logic Synthesis Techniques for Reduced Area Implementation of Multilevel Circuits with Concurrent Error Detection

Nur A. Touba, Edward J. McCluskey, Stanford Univ., Stanford, CA

This paper presents new logic synthesis techniques for generating multilevel circuits with concurrent error detection based on a parity-check code scheme that can detect all errors caused by single stuck-at faults. These synthesis techniques fully automate the design process and allow for a better quality result than previous methods thereby reducing the cost of concurrent error detection. An algorithm is described for selecting a good parity-check code for encoding the outputs of a circuit. Once the code has been chosen, a new procedure called structure-constrained logic optimization is used to minimize the area of the circuit as much as possible while still using a circuit structure that ensures that single stuck-at faults cannot produce undetected errors. The implementation that is generated is path fault secure and when augmented by a checker forms a self-checking circuit. Results indicate that self-checking multilevel circuits can be generated which require significantly less area than using duplication.


SESSION 10C ANALOG MACROMODELING AND TEST

Moderators: Rob A. Rutenbar, Carnegie Mellon Univ., Pittsburg, PA
Seijiro Moriyama, Toshiba Corp., Yokohama, Japan

10C.1 Macromodeling of Analog Circuits for Hierarchical Circuit Design

Jianfeng Shao, Ramesh Harjani, Univ. of Minnesota, Minneapolis, MN

Hierarchy plays a significant role in the design of digital and analog circuits. At each level of the hierarchy it becomes essential to evaluate if a sub-block design is feasible and if so which design style is the best candidate for the particular problem. This paper proposes a general methodology for evaluating the feasibility and the performance of sub-blocks at all levels of the hierarchy. A modified simplicial approximation technique is used to generate the feasibility macromodel and a layered volume-slicing methodology with radial basis functions is used to generate the performance macro-model. However, due to lack of space, only details of the performance macromodeling techniques are included. Macromodels are developed and verified for analog blocks at three different levels of hierarchy (current mirror, opamp, and A/D converter).

10C.2 Approximate Symbolic Analysis of Large Analog Integrated Circuits

Qicheng Yu, Carl Sechen, Univ. of Washington, Seattle, WA

This paper describes a unified approach to the approximate symbolic analysis of large linearized analog circuits. It combines two new approximation-during-computation strategies with a variation of the classical two-graph tree enumeration method. The first strategy is to generate common trees of the two-graphs, and therefore the product terms in the symbolic network function, in the decreasing order of magnitude. The second approximation strategy is the sensitivity-based simplification of two-graphs, which excludes from the two-graphs many circuit elements that have little effect on the network function being derived. Our approach is therefore able to symbolically analyze much larger analog integrated circuits than previous reported, using complete small signal models for the semiconductor devices. We show accurate yet reasonably sized symbolic network functions for integrated circuits with up to 39 transistors whereas previous approaches were limited to less than 15.

10C.3 Testing of Analog Systems Using Behavioral Models and Optimal Experimental Design Techniques

Eric Felt, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA

This paper describes a new CAD algorithm which performs automatic test pattern generation (ATPG) for a general class of analog systems, namely those circuits which can be efficiently modeled as an additive combination of user-defined basis functions. The algorithm is based on the statistical technique of I-optimal experimental design, in which test vectors are chosen to be maximally independent so that circuit performance will be characterized as accurately as possible in the presence of measurement noise and model inaccuracies. This technique allows analog systems to be characterized more accurately and more efficiently, thereby significantly reducing system test time and hence total manufacturing cost.


SESSION 10D ROUTABILITY

Moderators: Jason Cong, Univ. of California, Los Angeles, CA
Takashi Kambe, SHARP Co., Nara, Japan

10D.1 Layer Assignment for High-Performance Multi-Chip Modules

Kai-Yuan Chao, D.F. Wong, Univ. of Texas, Austin, TX

In this paper, we present a layer assignment method for high-performance multi-chip module environments. In contrast with treating global routing and layer assignment separately, our method assigns nets to layers while considering preferable global routing topologies simultaneously. We take transmission line effects into account to avoid noise in high-speed circuit packages. The problem is formulated as a quadratic Boolean programming problem and an algorithm is presented to solve the problem after linearization. Our method is applied to a set of benchmark circuits to demonstrate the effectiveness.

10D.2 The Reproducing Placement Problem with Applications

Wei-Liang Lin, Majid Sarrafzadeh, Northwestern Univ., Evanston, IL, C.K. Wong,

We study a new placement problem: the reproducing placement problem (RPP). In each phase a module (or gate) is decomposed into two (or more) simpler modules. The goal is find a "good" placement in each phase. The problem, being iterative in nature, requires an iterative algorithm. The problem finds applications in several gate-level placement problems, e.g., in layout-driven logic synthesis.

We introduce the notion of minimum floating Steiner trees (MFST). We employ an MFST algorithm as a central step in solving the RPP. A Hanan-like theorem is established for the MFST problem and two approximation algorithms are proposed. Experiments on commonly employed benchmarks verify the effectiveness of the proposed technique.

10D.3 RISA: Accurate and Efficient Placement Routability Modeling

Chih-liang Eric Cheng, Cadence Design Systems, Inc., San Jose, CA

The prevalence of net list synthesis tools raises great concern on routability of cell placement created with state-of-the-art placement techniques. In this paper, an accurate and efficient placement routability modeling technique is proposed and incorporated into the prevailing simulated annealing approach. This accurate and efficient modeling is based on the supply versus demand analysis of routing resource over an array of regions on a chip. Vertical and horizontal routability is analyzed separately due to the bias of routing resource in multiple-metal-layer ASIC designs. A special technique on net bounding box partitioning is also proposed and critical to the accuracy of this modeling at the presence of mega cells, which tend to cause local routing congestion. By incorporating this efficient modeling into the cost function of simulated annealing, experiments conducted on small to large industrial designs indicate that placement routability evaluated with a global router is greatly improved as a result of the proposed accurate modeling.


SESSION 11A SEQUENTIAL OPTIMIZATION

Moderators: Fabio Somenzi, Univ. of Colorado, Boulder, CO
Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA

11A.1 A New Approach for Factorizing FSM's

C. Rama Mohan, Cadence Design Systems Pvt. Ltd., Noida, India, P.P. Chakrabarti, Indian Inst. of Tech., Kharagpur, India

Exact factors as defined in [2], if present in an FSM can result in most effective way of factorization. However, it has been found that most of the FSM's are not exact factorizable. In this paper, we have suggested a method of making FSM's exact factorizable by minor changes in the next state space while maintaining the functionality of the FSM. We have also developed a new combined state assignment algorithm for state encoding of Factored and Factoring FSM's. Experimental results on MCNC benchmark examples, after running MISII on the Original FSM, Factored FSM and Factoring FSM have shown a reduction of 40% in the worst case signal delay through the circuit in a multilevel implementation. The total number of literals, on an average is the same after factorization as that abtained by running MISII on the original FMS. For two-level implementation, our method has been able to factorize Benchmark FSM's with a 14% average increase in overall areas, while the areas of combinational components of Factored and Factoring FSM's have been found to be significantly less than the area of the combinational component of the original FSM.

11A.2 Boolean Constrained Encoding: A New Formulation and a Case Study

N.L.V. Calazans, PUCRS, Porto Alegre, RS, Brazil

This paper provides a new, generalized approach to the problem of encoding information as vectors of binary digits. We furnish a formal definition for the Boolean constrained encoding problem, and show that this definition encompasses many particular encoding problems found in VLSI design, at various description abstraction levels. Our approach can capture equivalence and/or compatibility classes in the original symbol set to encode, by allowing symbols codes to be cubes of a Boolean space, instead of the usual minterms. Besides, we introduce a unified framework to represent encoding constraints which is more general than previous efforts. The framework is based upon a new definition of the pseudo-dichotomy concept, and is adequate to guide the solution of encoding problems through the satisfaction of constraints extracted from the original problem statement. An encoding problem case study is presented, the state assignment of synchronous finite state machines with the simultaneous consideration of state minimization. The practical comparison with well-established approaches to solve this problem in two separate steps, shows that our solution is competitive with other published results. However, the case study is primarily intended to show that feasibility of the Boolean constrained encoding problem formulation.

11A.3 Optimizing of Hierarchical Designs Using Partitioning and Resynthesis

Heinz-Josef Eikerling, Ralf Hunstock, Univ. of Raderborn, Paderborn, Germany
Raul Camposano, Synopsys, Inc., Mountain View, CA

This paper explores the influence of optimization along the boundary between hierarchically described components. A novel technique called repartitioning combines partitioning and sequential resynthesis of the design under various quality measures. It is applied to various digital circuits which consist of a controller and a datapath. The outcome of this effort is a versatile, parametrizable resynthesis tool which preserves this hierarchy. Due to the cost measures, an average improvement ranging between 5% and 15% was obtained.


SESSION 11B FAULT SIMULATION

Moderators: Prab Varma, CrossCheck Technology, Inc., San Jose, CA
Tom Niermann, Sunrise Test Systems, Sunnyvale, CA

11B.1 HyHOPE: A Fast Fault Simulator with Efficient Simulation of Hypertrophic Faults

Chen-Pin Kung, Chen-Shang Lin, National Taiwan Univ., Taiwan, ROC

In sequential circuit fault simulation, the hypertrophic faults, which result from lengthened initialization sequence in the faulty circuits, usually produce a large number of fault events during simulation and require excessive gate evaluations. These faults degrade the performance of fault simulators attempting to simulate them exactly. In this paper, an exact simulation algorithm is developed to identify the hypertropic faults and to minimize their effects during the fault simulation. The simulator HyHOPE based on this algorithm shows that the average speedup ratio over HOPE 1.1 is 1.57 for ISCAS89 benchmark circuits. Furthermore, the result indicates the performance of HyHOPE is close to the approximate simulator in which faults are simply dropped when they become potentially detected.

11B.2 Fast Timing Simulation of Transient Faults in Digital Circuits

A. Dharchoudhury, Sung Mo Kang, Hungse Cha, Janak H. Patel, Univ. of Illinois, Urbana, IL

Transient fault simulation is an important verification activity for circuits used in critical applications since such faults account for over 80% of all system failures. This paper presents a timing level transient fault simulator that bridges the gap between electrical and gate-level transient fault simulators. A generic MOS circuit primitive and analytical solutions of node differential equations are used to perform transistor level simulation with accurate MOS-FET models. The transient fault is modeled by a piece-wise quadratic injected current waveform; this retains the electrical nature of the transient fault and provides SPICE-like accuracy. Detailed comparisons with SPICE3 show the accuracy of this technique and speedups of two orders of magnitude are observed for circuits containing up to 2000 transistors. Latched error distributions of the benchmark circuits are also provided.

11B.3 A Fast and Memory-Efficient Diagnostic Fault Simulation for Sequential Circuits

Jer Min Jou, Shung-Chih Chen, National Cheng Kung Univ., Tainan, Taiwan, ROC

In this paper, a fast and memory-efficient diagnostic fault simulator for sequential circuits is proposed. In it, a two-level optimization technique is developed and used to prompt the processing speed. In the first high level, an efficient list, which stores the indistinguishable faults so far for each fault during simulation, and the list maintaining algorithm are applied, thus the number of diagnostic comparisons in minimized. In the second low level, a bit-parallel comparison is developed to speed up the comparing process. Therefore, the different diagnostic measure reports for a given test set can be generated very quickly. In addition, the simulator is extended to diagnose the single stuck-at device fault. Experimental results slow that this diagnostic simulator achieves a significant speedup compared to previous methods.


SESSION 11C TIMING ANALYSIS

Moderators: Tom Szymanski, AT&T Bell Labs., Murray Hill, NJ
Sharad Malik, Princeton Univ., Princeton, NJ

11C.1 Timing Uncertainty Analysis for Time-of-Flight Systems

John R. Feehrer, Harry F. Jordan, Univ. of Colorado, Boulder, CO

Time-of-flight synchronization is a new digital design methodology that eliminates all latching devices, allowing higher clock rates than alternative timing schemes. Synchronization is accomplished by precisely balancing connection delays. Many effective pipeline stages are created by pipelining combinational logic, similar in concept to wave pipelining but differing in several respects. Due to the unique flow-through nature of circuits and to the need for pulse-mode operation, time-of-flight design exposes interesting new are as for CAD timing analysis. This paper discusses areas for CAD timing analysis. This paper discusses how static propagation delay uncertainty limits the clock period for time-of-flight circuits built with opto-electronic devices. We present algorithms for placing a minimum set of clock gates to restore timing in feedback loops that implement memory and for propagating delay uncertainty through a circuit graph. A mixed integer program determining the minimum feasible clock period subject to pulse width and arrival time constraints is discussed. Algorithms are implemented in XHatch, a time-of-flight CAD package.

11C.2 Provably Correct High-Level Timing Analysis Without Path Sensitization

S. Bhattacharya, Duke Univ., Durham, NC, Sujit Dey, C&C Research Labs., NEC USA, Inc., Princeton, NJ
Franc Brglez, North Carolina State Univ., Raleigh, NC

This paper addresses the problem of true delay estimation during high level design. The existing delay estimation techniques either estimate the topological delay of the circuit which may be pessimistic, or use gate-level timing analysis for calculating the true delay, which may be prohibitively expensive.

We show that the paths in the implementation of a behavioral specification can be partitioned into two sets, SP and UP. While the paths in SP can affect the delay of the circuit, the paths in UP cannot. Consequently, the true delay of the resulting circuit can be computed by just measuring the topological delay of the paths in SP , eliminating the need for the computationally intensive process of path sensitization. Experimental results show that high-level true delay estimation can be done very fast, even when gate-level true delay estimation becomes computationally infeasible. The high-level delay estimates are verified by comparing with delay estimates obtained by gate-level timing analysis on the actual implementation.

11C.3 A Timing Analysis Algorithm for Circuits with Level-Sensitive Latches

Jin-fuw Lee, Donald T. Tang, C.K. Wong, IBM Corp., Yorktown Heights, NY

For a logic design with level-sensitive latches, we need to validate timing signal paths which may flush through several latches. We developed efficient algorithms based on the modified shortest and longest path method. The computational complexity of our algorithm is generally better than that of known algorithms in the literature. The implementation (CYCLOPSS) has been applied to an industrial chip to verify the clock schedules.


SESSION 11D NEW CHALLENGES FOR DESIGN DATA MANAGEMENT: RE-USE AND HDL'S

Moderators: Margarida F. Jacome, Carnegie Mellon Univ., Pittsburgh, PA
Pieter van der Wolf, Delft Univ. of Tech., Delft, The Netherlands

11D.1 An Object-Oriented Cell Library Manager

Naresh K. Sehgal, Intel Corp., Santa Clara, CA, C.Y. Roger Chen, Syracuse Univ., Syracuse, NY, John M. Acken, Intel Corp., Santa Clara, CA

New techniques are proposed to obtain better estimates and optimizations at higher levels of design abstractions, which are then used for library cell selection. A single object-oriented database repository is used during all phases of VLSI design to enhance the early design estimates. As compared to a relational database using sorted tables of attribute values, the proposed object-oriented cell library manager reduces search time for an appropriate cell , with m constraints among n cells, from O(nm) to O(m log n). The proposed method also reduces design cycle time by reducing the number of iterations due to mismatched performance estimates done in the earlier phases of a design.

11D.2 Reuse of Design Objects in CAD Frameworks

Joachim Altmeyer, Stefan Ohnsorge, Bernd Schürmann, Univ. of Keiserslautern, Kaiserslautern, Germany

The reuse of well-tested and optimized design objects is an important aspect for decreasing design times, increasing design quality, and improving the predictability of designs. Reuse spans from the selecting cells from a library up to adapting already designed objects. In this paper, we present a new model for reusing design objects in CAD frameworks. Based on experiences in other disciplines, mainly in software engineering and case-based reasoning, we developed a feature-based model to describe design objects and their similarities. Our model considers generic modules as well as multi-functional units. We discuss the relationships of the model to the design process and to the configuration hierarchy of complex design objects. We examined our model with the prototype system RODEO.

11D.3 Towards Support for Design Description Languages in EDA Frameworks

Olav Schetler, Susanne Heymann, GMD, Sankt Augustin, Germany

We report on a new framework service for design tool encapsulation, based on an information model for design management. The new service uses generated language processors that perform import and export of design files to and from a design management database with the support of nested syntax specifications and extension language scripts. Our prototype design environment is based on the Nelsis CAD Framework and several tools from the Synopsys high-level synthesis and simulation tool suite.