*Best Paper Award Candidate
During the past 12 months, there has been much discussion on the long term health, and even the viability, of the EDA industry. Many electronics manufacturers believe that the EDA industry, in its current state, will be incapable of delivering all of the technologies and solutions needed during the next ten years. Is the industry in a position to meet the needs of the electronics industry? What is the proper model that will enable the EDA industry to return to profitability and health and, therefore, be capable of delivering what the market demands? How do EDA suppliers intend to address this perceived problem? What is their vision for the future of EDA technology? What is their view on the role that their consulting service organizations will play in the future?
This panel, which consists of executives from across the industry, will address these and other issues. Specifically, panelists will discuss their vision of how the EDA industry must evolve during the next five years, as well as their views on the future direction of design technology. The format of the panel will consist of a full 1.5 hours of open discussion and debate.
The overall design methodology of the 64-Bit UltraSPARC processor is described, covering the spectrum from high level simulation and verification, through final layout.
2.1 A Fast and Flexible Performance Simulator for Micro-Architecture
Trade-off Analysis on UltraSPARC(TM)-I
Marc Tremblay, Guillermo Maturana, Atsushi Inoue, Les Kohn
Over one hundred micro-architecture features were analyzed and simulated in order to determine if they should be included in UltraSPARC-I. A fast and flexible performance simulator was developed in order to model these features. In this paper, we describe UPS (UltraSPARC-I Performance Simulator), and show how it was used to do trade-off analysis.
2.2 System Design Methodology of UltraSPARC(TM)-I
Lawrence Yang, David Gao, Jamshid Mostoufi, Raju Joshi, Paul Loewenstein
Increasing complexity of microprocessor-based systems puts pressure on a products time-to-market. We describe methodology used in designing the system interface of UltraSPARC-I. This methodology allowed us to define the system interface architecture, verify the functionality, perform timing and noise analysis and do the principle board design in parallel. This concurrency permitted rapid implementation of the micro-processor and the system.
2.3 UltraSPARC(TM)-I Emulation
James Gateley, Miriam Blatt, Dennis Chen, Scott Cooke, Piyush Desai,
Manjunath Doreswamy, Mark Elgood, Gary Feierbach, , Tim Goldsbury, Dale
Greenley, Raju Joshi, Mike Khosraviani, Robert Kwong, Manish Motwani,
Chitresh Narasimhaiah, Sam J. Nicolino Jr., Tooru Ozeki, Gary Peterson,
Chris Salzmann, Nasser Shayesteh, Jeffrey Whitman, Pak Wong
The next generation UltraSPARC-I CPU represents a significant step forward in processor performance at the cost of increased design complexity. Added complexity increases the risks in achieving functionally correct first silicon. Existing design verification techniques were supplemented by applying emulation to obtain an early look at functionality. Discussed are the goals, methods and results of the UltraSPARC-I emulation.
2.4 CAD Methodology for the Design of UltraSPARC(TM) -I Microprocessor at Sun
Microsystems Inc.
A.Cao, A.Adalal, J.Bauman, P.Delisle, P.Dedood, P.Donehue,
M.Dell'Oca Khouja, T.Doan, M.Doreswamy, P.Ferolito, O.Geva, D.Greenhill,
S.Gopaladhine, J.Irwin, L.Lev, J.MacDonald, M.Ma, S.Mitra, P.Patel,
A.Prabhu, R.Puranik, S.Rozanski, N.Ross, P.Saggurti, S.Simovich,
R.Sunder, B.Sur, W.Vercruysse, M.Wong, P.Yip, R.Yu, J.Zhou, G.Zyner
The overall CAD methodology for the design of UltraSPARC-I microprocessor at Sun is described in this paper. Topics discussed include: CAD flow strategy, tool development and integration strategy, and design infrastructure. The importance of concurrent design style, modular CAD flow environment, incremental design verification and early design quality checking is strongly emphasized in this paper.
Power reduction in register-transfer level and architectural synthesis can be achieved through different techniques that are presented in this session.
3.1 Computing the Maximum Power Cycles of a Sequential Circuit
Srilatha Manne, Abelardo Pardo, R. Iris Bahar, Gary D. Hachtel, Fabio
Somenzi, Enrico Macii, Massimo Poncino
This paper studies the problem of estimating worst case power dissipation in a sequential circuit. We approach this problem by finding the maximum average weight cycles in a weighted directed graph. In order to handle practical sized examples, we use symbolic methods, based on Algebraic Decision Diagrams (ADDs), for computing the maximum average length cycles as well as the number of gate transitions in the circuit, which is necessary to construct the weighted directed graph.
3.2 Register Allocation and Binding for Low Power
Jui-Ming Chang, Massoud Pedram
This paper describes a technique for calculating the switching activity of a set of registers shared by different data values. Based on the assumption that the joint pdf (probability density function) of the primary input random variables is known or that a sufficiently large number of input vectors has been given, the register assignment problem for minimum power consumption is formulated as a minimum cost clique covering of an appropriately defined compatibility graph (which is shown to be transitively orientable). The problem is then solved optimally (in polynomial time) using a max-cost flow algorithm. Experimental results confirm the viability and usefulness of the approach in minimizing power consumption during the register assignment phase of the behavioral synthesis process.
3.3 Memory Segmentation to Exploit Sleep Mode Operation
Amir H. Farrahi, Gustavo E. Téllez, Majid Sarrafzadeh
Sleep mode operation and exploiting it to minimize the average power consumption are of great importance. In this paper, we formulate the memory segmentation/partitioning problem to exploit sleep mode operation and show that the problem is NP-complete. We present polynomial time algorithms for special classes of the problem. Some generalizations of the problem are discussed. Preliminary experiments are conducted to show the effectiveness of the algorithms and applicability of the approach. The experimental data confirm that a careful partitioning allows up to 40% more sleep time which could be exploited to minimize the average power consumption. Directions for further research in this area are presented.
3.4 Power-Profiler: Optimizing ASICs Power Consumption at the Behavioral Level
Raul San Martin, John P. Knight
This paper presents a methodology and tool (Power-Profiler) for the optimization of average and peak power consumption in the behavioral synthesis of ASICs. It considers lowering operating voltages, disabling the clock of components not in use, and architectural trade-offs, while also keeping silicon area at reasonable sizes. By attacking the power problem from the behavioral level, it can exploit an application’s inherent parallelism to meet the desired performance and compensate for slower and less power-hungry operators.
This session focuses on the impact of the target technology and layout on logic synthesis. The first paper extends Boolean matching to incompletely specified Boolean functions. The next three address single and multiple output function decomposition for LUT-based devices. The last paper combines synthesis with layout optimization.
4.1 Boolean Matching for Incompletely Specified Functions*
Kuo-Hua Wang, TingTing Hwang
Boolean matching is to check the equivalence of two functions under input permutation and input/output phase assignment. In this paper, we will address Boolean matching problem for incompletely specified functions. We will formulate the searching of input variable mapping between two target functions as a logic equation by using multiple-valued function. Based on this equation, a Boolean matching algorithm will be proposed. Delay and power dissipation can also be taken into consideration when this method is used for technology mapping. Experimental results on a set of benchmarks show that our algorithm is indeed very effective in solving Boolean matching problem for incompletely specified functions.
4.2 Functional Multiple-Output Decomposition: Theory and an Implicit Algorithm*
Bernd Wurth, Klaus Eckl, Kurt Antreich
We present theory and a novel, implicit algorithm for functional disjoint decomposition of multiple-output functions. While a Boolean function usually has a huge number of decomposition functions, we show that not all of them are useful for multiple-output decomposition. We therefore introduce the concept of preferable decomposition functions, which are sufficient for optimal multiple-output decomposition. We describe how to implicitly compute all preferable decomposition functions of a single-output, and how to identify all common preferable decomposition functions of a multiple-output function. Due to the implicit computation in all steps, the algorithm is very efficient. Applied to FPGA synthesis, the method combines the typically separated steps of common subfunction extraction and technology mapping. Experimental results show significant reductions in area.
4.3 A Method for Finding Good Ashenhurst Decompositions and Its Application to FPGA Synthesis
Ted Stanion, Carl Sechen
In this paper, we present an algorithm for finding a good Ashenhurst decomposition of a switching function. Most current methods for performing this type of decomposition are based on the Roth-Karp algorithm. The algorithm presented here is based on finding an optimal cut in a BDD. This algorithm differs from previous decomposition algorithms in that the cut determines the size and composition of the bound set and the free set. Other methods examine all possible bound sets of an arbitrary size. We have applied this method to decomposing functions into sets of variable functions. This is a required step when implementing a function using a lookup table (LUT) based FPGA. The results compare very favorably to existing implementations of Roth-Karp decomposition methods.
4.4 Lambda Set Selection in Roth-Karp Decomposition for LUT-Based FPGA Technology Mapping
Wen-Zen Shen, Juinn-Dar Huang, Shih-Min Chao
Roth-Karp decomposition is a classical decomposition method. Because it can reduce the number of input variables of a function, it becomes one of the most popular techniques used in LUT-based FPGA technology mapping. However, the lambda set selection problem, which can dramatically affect the decomposition quality in Roth-Karp decomposition, has not been formally addressed before. In this paper, we propose a new heuristic-based algorithm to solve this problem. The experimental results show that our algorithm can efficiently produce outputs with better decomposition quality than that produced by other algorithms without using lambda set selection strategy.
4.5 Minimizing the Routing Cost during Logic Extraction
Hirendu Vaishnav, Massoud Pedram
This paper describes techniques for reducing the routing cost during logic extraction. Two routing cost functions derived from the global structure of a boolean network are analyzed and the effectiveness of each cost function is compared against the conventional literal savings cost function. Experimental results obtained with these routing cost functions are presented and discussed in detail.
The ever increasing complexity of EDA tools, design flows, and methodologies provides a wealth of difficult problems for designers of EDA frameworks. This session offers new ideas for framework organization, design capabilities, and tool integration. The first paper describes techniques for automatically verifying a partial design against formal requirements. The second paper defines a strategy for integrating schedule management into flow management. The final two papers offer practical methods for automatically generating tool integration code from simple high-level specifications.
5.1 Requirements-Based Design Evaluation
Stephen T. Frezza, Steven P. Levitan, Panos K. Chrysanthis
This paper presents a methodology for automating the evaluation of partial designs using black-box testing techniques. This methodology generates black-box evaluation tests using a novel semantic graph data model to maintain the relationships between the related design and requirements data. We demonstrate the utility of this technique by using the relationship information to automatically generate and run functionality tests of partial designs against the related requirements.
5.2 Incorporating Design Schedule Management into a Flow Management System
Eric W. Johnson, Jay B. Brockman
In this paper we present an approach to incorporate design schedule management services into a flow management system. The basis of our approach is to derive a design schedule from the simulation of a flow execution. Actual flow execution can then be tracked against the proposed schedule via design metadata. We verify our approach by implementing design scheduling into the Hercules Workflow Manager.
5.3 Generating ECAD Framework Code from Abstract Models*
Joachim Altmeyer, Bernd Schürmann, Martin Schütze
This paper describes a novel approach for generating the code of ECAD frameworks automatically from abstract models. Instead of providing the framework components of a design system in the form of a code library, we favor to use such a library at modeling level. The frame-work components, common to all design tools, are described by easily understandable base models. For each design tool, the base models will be customized individually to meet the tool-specific requirements. Then, the customized models are input to a code generator that automatically generates tool specific, optimized source code. A generic modeling system and several code generators are implemented in a software generation environment called MOOSE.
5.4 Tool Integration and Construction Using Generated Graph-Based Design Representations
Ansgar Bredenfeld, Raul Camposano
In this paper we present a new data model for tool integration that extends existing data models by an abstract graph model. We provide a framework service based on this model which generates implementations of graph-based design representations from a conceptual schema. In two case studies we use them for rapid construction of a high-level synthesis tool and for easy fine-grained integration to an existing simulator.
Companies are forced into rethinking ways to radically improve design productivity. Company presidents are issuing corporate wide edicts such as “10x design process productivity improvement by 1998. Re-engineering the design process is one of the alternatives that some companies have tried. They have architected and implemented integrated multi-vendor design environments. They have installed automated parts library and workflow management systems. They have re-organized into cross functional teams, and trained their engineers on new design methodologies and design automation best practices.
The journey of managing significant process change has been a long and arduous one down a road full of potholes. Murphy's law, cultural inertia, politics, inexperience, underestimating, underfunding, lack of top management commitment, and more have conspired to grind projects to a halt.
This panel is a collection of griselled veterans who will tell about their personal experiences of attempting to manage process change.
The performance of digital designs depends heavily on the assignment of operations to clock cycles and phases. Three different optimal assignment techniques are described in this session.
7.1 Scheduling Using Behavioral Templates*
Tai Ly, David Knapp, Ron Miller, Don MacMillen
This paper presents the idea of behavioral templates in scheduling. A behavioral template locks several operations into a relative schedule with respect to one another. This simple construct proves powerful in addressing: (1) timing constraints, (2) sequential operation modeling, (3) pre-chaining of certain operations, and (4) hierarchical scheduling. We present design examples from industry to demonstrate the importance of these issues in scheduling.
7.2 Rephasing: A Transformation Technique for the Manipulation of Timing Constraints*
Miodrag Potkonjak, Mani Srivastava
We introduce a transformation, named rephasing, that manipulates the timing parameters in control-dataflow graphs. Traditionally high-level synthesis systems for DSP have either assumed that all the relative times, called phases, when corresponding samples are available at input and delay nodes are zero or have automatically assigned values to as part of the scheduling step when software pipelining is simultaneously applied. Rephasing, however, manipulates the values of these phases as a transformation before the scheduling. The advantage of this approach is that phases can be chosen to optimize the algorithm for metrics such as area and power. Moreover, rephasing can be combined with other transformations. We have developed techniques for using rephasing to optimize several design metrics. The experimental results show significant improvements.
7.3 Optimal ILP-Based Approach for Throughput Optimization Using Simultaneous Algorithm/Architecture Matching and Retiming
Y.G. DeCastelo-Vide-e-Souza, M. Potkonjak Alice C. Parker
System level design and behavior transformations have been rapidly establishing themselves as design steps with the most influential impact on final performance metrics, throughput and latency, of a design. In this paper we develop a formal ILP-based approach for throughput and latency optimization when algorithm-architecture matching, retiming, and pipelining are considered simultaneously. The effectiveness of the approach is demonstrated on several real-life examples.
Papers in this session explore synthesis techniques to reduce the number of paths to be tested, other algorithmic approaches towards considering fewer paths and diagnosis of sequential circuits.
8.1 Fast Identification of Robust Dependent Path Delay Faults
U. Sparmann, D. Luxenburger, K.-T. Cheng, S.M. Reddy
Recently, it has been shown in [1] and [2] that in order to verify the correct timing of a manufactured circuit not all of its paths need to be considered for delay testing. In this paper, a theory is developed which puts the work of these papers into a common framework, thus allowing for a better understanding of their relation. In addition, we consider the computational problem of identifying large sets of such not-necessary-to-test paths. Since the approach of [1] can only be applied for small scale circuits, we develop a new algorithm which trades quality of the result against computation time, and allows handling of large circuits with tens of millions of paths. Experimental results show that enormous improvements in running time are only paid for by a small decrease in quality.
8.2 On Synthesis-for-Testability of Combinational Logic Circuits
Irith Pomeranz, Sudhakar M. Reddy
We propose a synthesis method that modifies a given circuit to reduce the number of gates and the number of paths in the circuit. The synthesis procedure is based on replacing subcircuits of the given circuit by structures called comparison units. Comparison units are fully testable for stuck-at faults and for path delay faults. In addition, they have small numbers of paths and gates. These properties make them effective building blocks for synthesis of testable circuits. Experimental results demonstrate reductions in the number of gates and paths and increased path delay fault testability. The random pattern testability for stuck-at faults remains unchanged.
8.3 Rapid Diagnostic Fault Simulation of Stuck-at Faults in Sequential Circuits Using Compact Lists
Srikanth Venkataraman, Ismed Hartanto, W. Kent Fuchs, Elizabeth M. Rudnick, Sreejit Chakravarty, Janak H. Patel
This paper describes a diagnostic fault simulator for stuck-at faults in sequential circuits that is both time and space efficient. The simulator represents indistinguishable classes of faults as memory efficient lists. The use of lists reduces the number of output response comparisons between faults and hence speeds up the simulation process. The lists also make it easy to drop faults when they are fully distinguished from other faults. Experimental results on the ISCAS89 circuits show that the simulator runs significantly faster than an earlier work based on distinguishability matrices and is faster and more memory efficient than a recent method based on lists of indistinguishable faults . The paper provides the first reports on pessimistic and optimistic diagnostic measures for all faults of the large ISCAS circuits.
These papers relate to discrete-event simulation. The tutorial surveys the current state of parallel discrete-event simulation. The second paper is a new algorithm for parallel discrete-event simulation for VHDL. The final paper uses compiler optimization techniques to increase the performance of event-driven simulation.
9.1 Tutorial: Parallel Logic Simulation of VLSI Systems
Roger D. Chamberlain
Design verification via simulation is an important component in the development of digital systems. However, with continuing increases in the capabilities of VLSI systems, the simulation task has become a significant bottleneck in the design process. As a result, researchers are attempting to exploit parallel processing techniques to improve the performance of VLSI logic simulation. This tutorial describes the current state-of-the-art in parallel logic simulation, including parallel simulation techniques, factors that impact simulation performance, performance results to date, and the directions currently being pursued by the research community.
9.2 Asynchronous, Distributed Event Driven Simulation Algorithm for Execution of VHDL on Parallel Processors
Peter A. Walker, Sumit Ghosh
This paper describes a new Asynchronous, Parallel, Event Driven Simulation algorithm with inconsistent event Preemption, P(2)EDAS . P(2)EDAS represents a significant advancement in distributed conservative digital circuit simulation algorithms in that it permits the use of any number of non-zero propagation delays for every path between the input and output of every hardware entity. P(2)EDAS permits, accurate, concurrent, asynchronous, and efficient, i.e. deadlock free and null-message free, execution of sequential and combinatorial digital designs on parallel processors. It is a conservative algorithm in that only those output transitions, if any, are asserted at the output of a model following its execution, that are guaranteed correct. In addition, preemption of inconsistent events are allowed. P(2)EDAS extends to any simulator based on high-level hardware description language. This paper presents a detailed description of the algorithm.
9.3 A General Method for Compiling Event-Driven Simulations
Robert S. French, Monica S. Lam, Jeremy R. Levitt, Kunle Olukotun
We present a new approach to event-driven simulation that does not use a centralized run-time event queue, yet is capable of handling arbitrary models, including those with unclocked feedback and nonunit delay. The elimination of the event queue significantly reduces run-time overhead, resulting in faster simulation. We have implemented our algorithm in a prototype Verilog simulator called VeriSUIF. Using this simulator we demonstrate improved performance vs. a commercial simulator on a small set of programs.
Mobile and portable information systems are pushing electronics and the development process. In their wake are new requirements that drive low-power design. This is however by no means the only driving force behind the need for a dramatic reduction of power dissipation in digital ICs. There also exists a strong demand on producers of high end products to reduce power consumption. Power minimization is thus vital for different reasons in different applications.
Design knowledge and experience in minimizing power while optimizing performance is very limited. Low-power design is in its infancy. The same can be said for design tools. The opportunities for ultra low power tools and methodologies that achieve dramatic reduction of power and push higher up the design entry point are all around us. However, the need for tools that broaden the low-power design envelope is not being articulated by the user community very strongly.
This panel seeks to expose solutions that designers and EDA vendors have, to present proper vehicles for transferring the low power design techniques and methodologies to the user community, and to provide a forum for exploring what is still needed. What design paradigms make sense, what levels of accuracy and speed are acceptable to the users and what tools have the highest payoff are part of what this panel is all about.
Storage allocation plays an important role in synthesis of digital circuits and systems. Presentations in this session address the optimal assignment of data to registers and memory elements.
11.1 A Transformation-Based Approach for Storage Optimization
Wei-Kai Cheng, Youn-Long Lin
High-level synthesis (HLS) has been successfully targeted towards the digital signal processing (DSP) domain. Both application-specific integrated circuits (ASICs) and application-specific instruction-set processor (ASIPs) have been frequently designed using the HLS approach. Since most ASIP and DSP processors provide multiple addressing modes, and, in addition to classical constraint on the number of function units, registers, and buses, there are many resource usage rules, special considerations need to be paid to the optimizing code generation problem. In this paper we propose three transformation techniques, data management, data ordering, and transformational retiming, for storage optimization during code generation. With these transformations, some scheduling bottlenecks are eliminated, redundant instructions removed, and multiple operations mapped onto a single one. The proposed transformations have been implemented in a software system called T heda:MS . A set of benchmark programs has been used to evaluate the effectiveness of T heda:MS . Measurement on the synthesized codes targeted towards the TI-TMS320C40 DSP processor shows that the proposed approach is indeed very effective.
11.2 Register Minimization beyond Sharing among Variables
Tsung-Yi Wu, Youn-Long Lin
Traditionally, it is assumed that every variable in the input HDL (Hardware Description Language) behavioral description needs to be held in a register; A register can be shared by multiple variables if they have mutually disjoint life time intervals. This approach is effective for signal-flow-like computations such as various DSP algorithms. However, it is not the best for the synthesis of control-dominated circuits, which usually have variables/signals of different bit-width as well as very long lifetime. To go beyond register minimization by lifetime-analysis-based sharing, we propose holding some variables in the state registers, some signal nets, or some unclocked sequential networks. We have implemented the proposed method in a software program called VReg. Experimental results have demonstrated that VReg minimizes the number of registers more effectively than the lifetime-analysis-based approach does. Better register minimization also leads to both smaller area and faster designs.
Key Words: High-Level Synthesis; Control-Dominiated Circuit; Storage Synthesis.
11.3 Constrained Register Allocation in Bus Architectures
Elof Frank, Salil Raje, Majid Sarrafzadeh
Partitioned memory with bus interconnect architecture in its most general form consists of several functional units with associated memory accessible to the functional unit via local interconnect and global buses to communicate data values across from one functional unit to another. As can be expected, the time at which certain values are communicated affect the size of the local memories and the number of buses that are needed. We address the problem of scheduling communications in a bus architecture under memory constraints. We present here a network flow formulation for the problem and obtain an exact algorithm to schedule the communications, such that the constraint on the number of registers in each functional unit is satisfied. As an increasing number of architectures use multiple memories in addition to (or instead of) one central RAM, this work is especially interesting. Several authors have already studied this problem in related architectures, yet all use heuristic approaches to schedule the communications. Our technique is the first exact solution to the problem. Also, our graph theoretic formulation provides a clearer insight into the problem.
Sequential automatic test pattern generation is a challenging problem. The papers in this session investigate the effect of retiming on testability, combination of ad hoc and deterministic approaches for faster test generation and synthesis techniques for higher fault coverage.
12.1 On Test Set Preservation of Retimed Circuits*
Aiman El-Maleh, Thomas Marchok, Janusz Rajski, Wojciech Maly
Recently, it has been shown that retiming has a very strong impact on the run time of sequential, structural automatic test pattern generators (ATPGs), as well as the levels of fault coverage and fault efficiency attained. In this paper, we show that retiming preserves testability with respect to a single stuck-at fault test set by adding a prefix sequence of a pre-determined number of arbitrary input vectors. Experimental results show that high fault coverages can be achieved on high performance circuits optimized by retiming with a much less CPU time (a reduction of two orders of magnitude in several instances) than if ATPG is attempted directly on those circuits.
12.2 Combining Deterministic and Genetic Approaches for Sequential Circuit Test Generation
Elizabeth M. Rudnick, Janak H. Patel
A hybrid sequential circuit test generator is described which combines deterministic algorithms for fault excitation and propagation with genetic algorithms for state justification. Deterministic procedures for state justification are used if the genetic approach is unsuccessful, to allow for identification of untestable faults and to improve the fault coverage. High fault coverages were obtained for the ISCAS89 benchmark circuits and several additional circuits, and in many cases the results are better than those for purely deterministic approaches.
12.3 Partial Scan with Pre-selected Scan Signals
Peichen Pan, C. L. Liu
A partial scan approach proposed recently selects scan signals without considering the availability of the flip-flops (FFs). Such an approach can greatly reduce the number of scan signals since maximum freedom is allowed in scan signal selection. To actually scan the selected signals, we, however, must make them FF-driving signals. In this paper, we study the problem of modifying and retiming a circuit to make a pre-selected set of scan signals FF-driving signals while preserving the set of cycles being broken. We present a new approach for solving this problem. Based on the new approach we design an efficient algorithm. Unlike a previous algorithm which inherently has no control over the area overhead incurred during the modification, our algorithm explicitly minimizes the area overhead. The algorithm has been implemented and encouraging results were obtained.
The opening paper of this session gives a characterization of the correspondence between spectral partitioning and graph partitioning; a new heuristic for multi-way partitioning is also given. Two papers then address multi-way partitioning for minimum delay and area, and the use of replication cuts for improvement of delay and cycle time. In the fourth paper, pin-to-pin delay analysis improves the well-known Timberwolf layout package. The concluding paper offers a measure of the scaling behavior of layout heuristics, and suggests that there is room for improvement over current leading methods.
13.1 Spectral Partitioning: The More Eigenvectors, the Better*
Charles J. Alpert, So-Zen Yao
A spectral partitioning method uses the eigenvectors of a graph's adjacency or Laplacian matrix to construct a geometric representation (e.g., a linear ordering) which is then heuristically partitioned. We map each graph vertex to a vector in d-dimensional space, where d is the number of eigenvectors, such that these vectors constitute an instance of the vector partitioning problem. When all the eigenvectors are used, graph partitioning exactly reduces to vector partitioning. This result motivates a simple ordering heuristic that can be used to yield high-quality 2-way and multi-way partitionings. Our experiments suggest the vector partitioning perspective opens the door to new and effective heuristics.
13.2 Multi-way Partitioning for Minimum Delay for Look-Up Table Based FPGAs
Prashant Sawkar, Donald Thomas
In this paper we present a set cover based approach (SCP) to multi-way partitioning for minimum delay for Look-Up Tables based FPGAs. SCP minimizes the number of chip-crossings on each circuit path with minimum logic duplication costs to achieve implementations with minimum delay and minimum number of chips. The overall complexity of SCP is O(V(2)). Experimental results demonstrate that SCP produces partitions that on the averagge have 14% fewer chips, 28% fewer pins, and 93% fewer chip-crossings on each circuit path compared to ANN which is a simulated annealing based implementation of classical multi-way partitioning. SCP achieves this performance and compact packing at the cost of duplicating 13% of logic on the average. Additionally, in comparison with a lower bound we observe that SCP has produced near-optimal solutions.
13.3 Performance-Driven Partitioning Using a Replication Graph Approach
Lung-Tien Liu, Ming-Ter Kuo, Chung-Kuan Cheng, Te C. Hu
An efficient algorithm is proposed to tackle the performance-driven partitioning problem using retiming and replication. We devise a replication graph to model the composite effect of replication and retiming. With the replication graph, we formulate the problem as an integer linear programming problem. A heuristic algorithm is derived to solve the problem by exploring the dual program of its linear programming relaxation.
13.4 Timing Driven Placement for Large Standard Cell Circuits
William Swartz, Carl Sechen
We present an algorithm for accurately controlling delays during the placement of large standard cell integrated circuits. Previous approaches to timing driven placement could not handle circuits containing 20,000 or more cells and yielded placement qualities which were well short of the state of the art. Our timing optimization algorithm has been added to the placement algorithm which has yielded the best results ever reported on the full set of MCNC benchmark circuits, including a circuit containing more than 100,000 cells. A novel pin-pair algorithm controls the delay without the need for user path specification. The timing algorithm is generally applicable to hierarchical, iterative placement methods. Using this algorithm, we present results for the only MCNC standard cell benchmark circuits (fract, struct, and avq.small) for which timing information is available. We decreased the delay of the longest path of circuit fract by 36% at an area cost of only 2.5%. For circuit struct, the delay of the longest path was decreased by 50% at an area cost of 6%. Finally, for the large (22,000 cell) circuit avq.small, the longest path delay was decreased by 28% at an area cost of 6% yet only doubling the execution time. This is the first report of timing driven placement results for any MCNC benchmark circuit.
13.5 Quantified Suboptimality of VLSI Layout Heuristics
Lars W. Hagen, Dennis J.-H. Huang, Andrew B. Kahng
We show how to quantify the suboptimality of heuristic algorithms for NP-hard problems arising in VLSI layout. Our approach is based on the notion of constructing new scaled instances from an initial problem instance. From the given problem instance, we essentially construct doubled, tripled, etc. instances which have optimum solution costs at most twice, three times, etc. that of the original instance. By executing the heuristic on these scaled instances, and then comparing the growth of solution cost with the growth of instance size, we can measure the scaling suboptimality of the heuristic. We give experimentally determined scaling behavior of several placement and partitioning heuristics; these results suggest that siginificant improvement remains possible over current state-of-the-art methods.
Innovative design techniques are rapidly finding use in real-world industrial designs. This session includes VHDL simulation of Siemen's EWSD-System's processor, digital receiver design using generated VHDL generated from Data Flow Graphs, and Logic Verification for PowerPC.
14.1 Concurrent Design Methodology and Configuration Management of the SIEMENS EWSD - CCS7E Processor System Simulation
Thomas W. Albrecht
This paper outlines our successful Concurrent Design Methodology and Configuration Management adopted for a Processor system simulation at both VHDL and GATE level. The complexity of the system simulated was 4 PENTIUMs and 28 ASICs with a total gate count of 2.4 MGates. It has been proven that simulation of such a complex system can be done on VHDL level and the efficiency of finding and correcting errors early in the design cycle was demonstrated.
14.2 Digital Receiver Design Using VHDL Generation from Data Flow Graphs*
Peter Zepter, Thorsten Grötker, Heinrich Meyr
This paper describes a design methodology, a library of reusable VHDL descriptions and a VHDL generation tool used in the application area of digital signal processing, particularly digital receivers for communication links. The tool and the library interact with commercial system simulation and logic synthesis tools. The support of joint optimization of algorithm and architecture as well as the concept for design reuse are explained. The algorithms for generating VHDL code according to different user specifications are described. An application example is used to show the benefits and current limitations of the proposed methodology.
14.3 Logic Verification Methodology for PowerPC™ Microprocessor
Charles H. Malley, Max Dieudonné
The PowerPC logic verification methodology is a general purpose approach suitable for a large class of chip designs that can exceed five million transistors in size. Several validation techniques are integrated into an automated logic verification strategy. The success of this methodology has been demonstrated by realizing three PowerPC microprocessor chips that were functional the first time.
In the past few years, funding for university research has proved increasingly elusive. Costs continue to escalate, particularly to cover graduate student tuition. Meanwhile, government funding agencies are under increasing pressure to direct their resources away from basic research and toward programs with greater short term impact. Manufacturers have scaled back their in-house CAD groups, breaking historic ties with universities, and they have been forced to justify any funding of university research to managers trying to survive as profit centers. EDA companies have made their products available to universities for educational purposes, and computer manufacturers have provided discounts on hardware purchases, but a funding gap remains in covering personnel costs. Meanwhile, many companies have scaled back their research groups, relying on universities to serve as their R&D arms.
University researchers have responded to funding pressures by undertaking projects with shorter term objectives and with greater proprietary restrictions in order to obtain industry funding. They have also sought to gain from their intellectual property by such mechanisms as patents, software licensing, and affiliates programs that limit access to participating companies. These shifts have reduced the ability to pursue long term research topics and to freely disseminate research results.
This panel will address the relationship between industry and academia, and the implications the changing relationship will have on the future of EDA. Participants in the panel represent a cross section of university, industry, and funding agencies.
This session opens with a tutorial on synthesis techniques targeting low power dissipation followed by a paper on power-efficient technology-independent logic synthesis. The session concludes by presenting a new method of synthesizing low power combinational logic circuits from Shannon graphs.
16.1 Tutorial: A Survey of Optimization Techniques Targeting Low Power VLSI Circuits
Srinivas Devadas, Sharad Malik
—We survey state-of-the-art optimization methods that target low power dissipation in VLSI circuits. Optimizations at the circuit, logic, architectural and system levels are considered.
Keywords - low power, optimization, synthesis
16.2 Logic Extraction and Factorization for Low Power
Sasan Iman, Massoud Pedram
This paper describes algebraic procedures for node extraction and factorization that target low power consumption. New power cost functions are introduced for the sum-of-products and factored form representations of functions. These cost functions are then used to guide the power optimization procedures. It is also shown that using the proposed SOP power cost function, all extractions resulting in a power reduction will not result in an increase in the number of literals in the network. The procedures described in this paper were implemented and results show 16% average improvement in power at the cost of 7% average increase in area.
16.3 Timed Shannon Circuits: A Power-Efficient Design Style and Synthesis Tool
Luciano Lavagno, Patrick C. McGeer, Alexander Saldanha, Alberto L. Sangiovanni-Vincentelli
A method of synthesizing low-power combinational logic circuits from Shannon Graphs is proposed such that an n input, m output circuit realization using 2-input gates with unbounded fanout has O(nm) transitions per input vector. Under a bounded fanout model, the transition activity is increased at most by a factor of n. Moreover, the power consumption is independent of circuit delays.
The first two papers describe methods for automatic module layout generation of RAM blocks and leaf cells. Practical considerations include simulation and transistor sizing. The third paper deals with the speed-memory trade-off for extracting parasitic resistance from layout. This is an important parameter for high-performance design in the deep submicron era.
17.1 The Aurora RAM Compiler
Ajay Chandna, C. David Kibler, Richard B. Brown, Mark Roberts, Karem A. Sakallah
This paper describes a RAM compiler for generating and characterizing highly manufacturable optimized SRAMs using GaAs E/D MESFET technology. The compiler uses a constraint-driven design flow to achieve process tolerant RAMs. This compiler was built using a flexible design framework that can be easily adapted to optimize and characterize memories in different MESFET processes.
17.2 Automatic Layout Synthesis of Leaf Cells
Sanjay Rekhi, J. Donald Trotter, Daniel H. Linder
This paper describes algorithms for automatic layout synthesis of leaf cells in ld and in a new 11/2d layout style, useful for nondual circuit styles . The graph theory based algorithms use concepts set forth by Euler and Hamilton to achieve two tasks. The transistor placement algorithm uses the Euler’s theorem, while the placement of the groups of the transistors is achieved by using Hamiltonian graphs. Results show that the algorithms produce extremely competent layouts when compared to other algorithms in the literature and manual layouts.
17.3 Delayed Frontal Solution for Finite-Element Based Resistance Extraction
N.P. van der Meijs, A.J. van Genderen
To save memory, layout-to-circuit extractors that use the Finite-Element Method for resistance extraction usually solve the corresponding set of equations with a frontal solution method. We show that this method is inefficient when used with a scanline ordering of the elements. As an improvement, we introduce the Delayed Frontal Solution algorithm, which allows us to replace the scanline ordering by the minimum-degree ordering. Thus, extraction times are reduced with more than one order of magnitude at a small cost of extra memory.
With rapidly evolving design techniques, there is often a considerable gap between theory and practice. The emphasis in this session is on emerging CAD techniques to close this gap. The first paper describes a functional test strategy for an advanced microprocessor family. The second paper describes a framework for creating efficiently synthesizable behavioral descriptions. The third paper describes how formal synthesis can be applied to ASIC design. The final paper shows the application of formal methods to hardware model verification.
18.1 Test Program Generation for Functional Verification of PowerPC Processors in IBM*
Aharon Aharon, Dave Goodman, Moshe Levinger, Yossi Lichtenstein, Yossi Malka,
Charlotte Metzger, Moshe Molcho, Gil Shurek
A new methodology and test program generator have been used for the functional verification of six IBM PowerPC processors. The generator contains a formal model of the PowerPC architecture and a heuristic database of testing expertise. It has been used on daily basis for two years by about a hundred designers and testing engineers in four IBM sites. The new methodology reduced significantly the functional verification period and time to market of the PowerPC processors. Despite the complexity of the PowerPC architecture, the three processors verified so far had fully functional first silicon.
18.2 Behavioral Synthesis Methodology for HDL-Based Specification and Validation*
D. Knapp, T. Ly, D. MacMillen, R. Miller
This paper describes a HDL synthesis based design methodology that supports user adoption of behavioral-level synthesis into normal design practices. The use of these techniques increases understanding of the HDL descriptions before synthesis, and makes the comparison of pre- and post-synthesis design behavior through simulation much more direct. This increases user confidence that the specification does what the user wants, i.e. that the synthesized design matches the specification in the ways that are important to the user. At the same time, the methodology gives the user a powerful set of tools to specify complex interface timing, while preserving a user’s ability to delegate decision-making authority to software in those cases where the user does not wish to restrict the options available to the synthesis algorithms.
18.3 Design-Flow and Synthesis for ASICs: A Case Study
Massimo Bombana, Patrizia Cavalloro, Salvatore Conigliaro, Roger B. Hughes, Gerry Musgrave, Giuseppe Zaza
The growing complexity of devices to be designed and manufactured, and the need to reduce the time to market, stress the importance of sound design methodologies. In this framework formal synthesis has the advantage of increasing the quality both of the design process and of the realized devices. The problem of relating the different abstraction levels involved in the extended design process is solved through the use of logic synthesis tools. The evaluation of the design constraints, characterizing optimal implementations such as area and timing, provide the most pragmatic approach to identify efficient guidelines applicable in the abstract phases of the design flow. The resulting design methodology combining both formal and more traditional design tools has been tested on a complex device in the area of telecommunications.
18.4 Model Checking in Industrial Hardware Design
Jörg Bormann, Jörg Lohse, Michael Payer, Gerd Venzl
This paper describes how model checking has been integrated into an industrial hardware design process. We present an application oriented specification language for assumption/commitment style properties and an abstraction algorithm that generates an intuitive and efficient representation of synchronous circuits . These approaches are embedded in our Circuit Verification Environment CVE. They are demonstrated on two industrial applications.
This session presents advances in retiming algorithms for sequential circuits and in state assignment for finite state machines. The papers on retiming address issues related to the generality of the circuit model and the applicability of retiming to circuits without a reset state. More realistic delay models are considered than those used in previous algorithms. The paper on state assignment presents an interesting and effective transformation of the problem into a placement problem.
19.1 DELAY: An Efficient Tool for Retiming with Realistic Delay Modeling*
Kumar N. Lalgudi, Marios C. Papaefthymiou
The retiming transformation can be used to optimize synchronous circuits for maximum speed of operation by relocating their storage elements. In this paper, we describe DelaY , a tool for retiming edge-triggered circuits under a realistic delay model that handles load-dependent gate delays, variable register setup times, interconnect delays, and clock skew. The operation of DelaY relies on a novel linear programming formulation of the retiming problem in this model. For the special case where clock skew is monotonic and all registers have equal propagation delays, the retiming algorithm in our tool runs in polynomial time and can transform any given edge-triggered circuit to achieve a specified clock period in O ( V 3 F ) steps, where V is the number of logic gates in the circuit and F is bounded by the number of registers in the circuit.
19.2 A Fresh Look at Retiming Via Clock Skew Optimization
Rahul B. Deokar, Sachin S. Sapatnekar
The introduction of clock skew at an edge-triggered flip-flop has an effect that is similar to the movement of the flip-flop across combinational logic module boundaries, and these are continuous and discrete optimizations with the same effect. While this fact has been recognized before, this work, for the first time, utilizes this information to find a minimum/specified period retiming efficiently. The clock period is guaranteed to be at most one gate delay larger than a tight lower bound on the optimal clock period; this bound is achievable using a combination of intentional skew and retiming. All ISCAS89 circuits can be retimed in a few minutes by this algorithm.
19.3 The Validity of Retiming Sequential Circuits
Vigyan Singhal, Carl Pixley, Richard L. Rudell, Robert K. Brayton
Retiming has been proposed as an optimization step for sequential circuits represented at the net-list level. Retiming moves the latches across the logic gates and in doing so changes the number of latches and the longest path delay between the latches. In this paper we show by example that retiming a design may lead to differing simulation results when the retimed design replaces the original design. We also show, by example, that retiming may not preserve the testability of a sequential test sequence for a given stuck-at fault as measured by a simulator. We identify the cause of the problem as forward retiming moves across multiple-fanout points in the circuit. The primary contribution of this paper is to show that, while an accurate logic simulation may distinguish the retimed circuit from the original circuit, a conservative three-valued simulator cannot do so. Hence, retiming is a safe operation when used in a design methodology based on conservative three-valued simulation starting each latch with the unknown value.
19.4 Retiming Synchronous Circuitry with Imprecise Delays
I. Karkowski, R.H.J.M. Otten
Often, and certainly in the early stages of a design, the knowledge about delays is imprecise. Stochastic programming is not an adequate means to account for this imprecision. Not only is a probability distribution seldom a correct translation of the designer’s delay knowledge, it also leads to inefficient algorithms. In this paper possibilistic programming is proposed for handling the retiming problem where delays are modelled as (triangular) possibilistic numbers . Beside the capability of optimizing the most possible clock cycle time and generating its possibility distribution, it allows for trade-offs between reducing clock cycle time and chances for obtaining worse solutions. It is shown that the computational complexity is the same as for retiming with exact circuit delays.
19.5 A Fast State Assignment Procedure for Large FSMs
Shihming Liu, Massoud Pedram, Alvin M. Despain
This paper addresses the problem of state assignment for large Finite State Machines (FSM). This is an important problem in the high performance digital system design where added functionality often comes at the expense of a larger (and slower) FSM to control the system. We present a new method to solve the graph embedding problem which is the main step in the state assignment process. The basic idea is to place the state adjacency graph in a two-dimensional grid while minimizing the total wire length. The grid is then mapped into an n-dimensional hypercube while nearly preserving the adjacency relations that is with dilation at most 2. Experimental results are presented and compared with those of NOVA.
Papers in this session deal with accurate switch level fault modeling and simulation and other techniques to improve performance and accuracy of fault simulation of synchronous sequential circuits.
20.1 Software Accelerated Functional Fault Simulation for Data-Path Architectures
M. Kassab, N. Mukherjee, J. Rajski, J. Tyszer
This paper demonstrates how fault simulation of building blocks found in data-path architectures can be performed extremely efficiently and accurately by taking advantage of their simple functional models and structural regularity. This technique can be used to accelerate the simulation of those blocks in virtually any fault simulation environment, resulting in fault simulation algorithms that can perform fault grading in a very demanding BIST environment.
20.2 Symbolic Fault Simulation for Sequential Circuits and the Multiple Observation Time Test Strategy
R. Krieger, B. Becker, M. Keim
Fault simulation for synchronous sequential circuits is a very time-consuming task. The complexity of the task increases if there is no information about the initial state of the circuit. In this case an unknown initial state is assumed which is usually handled by introducing a three-valued logic. As it is well-known fault simulation based on this logic only determines a lower bound of the fault coverage. Recently it has been shown that fault simulation based on the multiple observation time test strategy can improve the accuracy of the fault coverage. In this paper we describe how this strategy can be successfully implemented based on Ordered Binary Decision Diagrams. Our experiments demonstrate the efficiency of the fault simulation procedure developed.
20.3 Accurate and Efficient Fault Simulation of Realistic CMOS Network Breaks
Haluk Konuk, F. Joel Ferguson, Tracy Larrabee
We present a new fault simulation algorithm for realistic break faults in the p-networks and n-networks of static CMOS cells. We show that Miller effects can invalidate a test just as charge sharing can, and we present a new charge-based approach that efficiently and accurately predicts the worst case effects of Miller capacitances and charge sharing together. Results on running our fault simulator on ISCAS85 benchmark circuits are provided.
20.4 Analysis of Switch-Level Faults by Symbolic Simulation
Lluís Ribas-Xirgo, Jordi Carrabina-Bordoll
This paper presents a symbolic method to detect short and open circuit faults in switch-level networks. Detection and fault sensitization vector determination are possible since the behavior of each node is described by a set of two functions: the on-set and the off-set functions. Their analyses provide designers with an efficient tool for circuit verification and test pattern generation.
The first three papers in this session address the delays in interconnect/ transmission lines. The first paper proposes an approach to constraint generation for interconnect synthesis, the next proposes an upper bound for the delay, and the third derives approximations to delays in RC lines terminated with a capacitive load. The next paper proposes a reduced-order modeling approach for coupling inductance computations. The final paper presents an approach for performance-driven routing in high speed MCMs and PCBs.
21.1 Transmission Line Synthesis
Byron Krauter, Rohini Gupta, John Willis, Lawrence T. Pileggi
RLC transmission line synthesis is a difficult problem because minimal net delay cannot be achieved or accurately predicted unless: 1) a termination scheme is chosen and properly implemented, and 2) output driver transition rates are constrained at or below the nets capability. This paper describes a method to concurrently find, for any RLC net, an optimal termination value, a maximum source transition rate, and an approximate net delay using the first few response moments. When optimal termination is accomplished and transition rates do not exceed the capabilities of the net, the resulting delay metrics are an interesting extension of the popular Elmore delay metric for RC interconnects. The task of physical RLC interconnect design is facilitated by the ease with which these first few moments are calculated for generalized RLC lines.
21.2 The Elmore Delay as a Bound for RC Trees with Generalized Input Signals
Rohini Gupta, Byron Krauter, Bogdan Tutuianu, John Willis, Lawrence T. Pileggi
The Elmore delay is an extremely popular delay metric, particularly for RC tree analysis. The widespread usage of this metric is mainly attributable to it being the most accurate delay measure that is a simple analytical function of the circuit parameters. The only drawbacks to this delay metric are the uncertainty as to whether it is an optimistic or a pessimistic estimate, and the restriction to step response delay estimation. In this paper, we prove that the Elmore delay is an absolute upper bound on the 50% delay of an RC tree response. Moreover, we prove that this bound holds for input signals other than steps, and that the actual delay asymptotically approaches the Elmore delay as the input signal rise time increases. A lower bound on the delay is also developed using the Elmore delay and the second moment of the impulse response. The utility of this bound is for understanding the accuracy and the limitations of the Elmore delay metric as we use it for design automation.
21.3 Delay Analysis of the Distributed RC Line
Vasant B. Rao
This paper reviews the step-response of the semi-infinite distributed RC line and focuses mainly on the step-response of a finite-length RC line with a capacitive load termination, which is the most common model for a wire inside the present day integrated CMOS chips. In particular, we obtain the values of some of the common threshold-crossing times at the output of such a line and show that even the simplest first order lumped II-approximation to the finite-length RC line terminated with a capacitive load is good enough for obtaining the 50% and 63.2% threshold-crossing times of the step-response. Higher order lumped approximations are necessary for more accurate predictions of the 10% and 90% threshold-crossing times.
21.4 Efficient Reduced-Order Modeling of Frequency-Dependent Coupling Inductances Associated with 3-D Interconnect Structures
L. Miguel Silveira, Matton Kamon, Jacob White
Since the first papers on asymptotic waveform evaluation ( AWE ), reduced order models have become standard for improving interconnect simulation efficiency, and very recent work has demonstrated that biorthogonalization algorithms can be used to robustly generate AWE-style macromodels. In this paper we describe using block Arnoldi-based orthogonalization methods to generate reduced order models from FastHenry , a multipole-accelerated three dimensional inductance extraction program. Examples are analyzed to demonstrate the efficiency and accuracy of the block Arnoldi algorithm.
21.5 Performance Driven Global Routing and Wiring Rule Generation for High Speed PCBs and MCMs
Sharad Mehrotra, Paul Franzon, Michael Steer
A new approach for performance-driven routing in highly congested high speed MCMs and PCBs is presented. Global routing is employed to manage delay, signal integrity and congestion simultaneously. Interconnect performance prediction models are generated through simulations. The global routing results and performance prediction models are used to generate bounds on the net lengths which can be used by a detailed router to satisfy constraints on interconnect performance.
The tutorial will present the interest and challenges of ASIC prototyping. Professer Gabrielle Saucier from Innovative Synthesis Technologies will make a short presentation about the reasons of the strong interest in ASIC prototyping. She will introduce the general scenario of ASIC prototyping and will insist on the software environment required to successfully implement complex ASIC prototyping.
Mr Dasha Estes from Quickturn will introduce briefly the concept of prototyping engines discussing the respective interest of commercial prototyping engines versus proprietary dedicated prototyping boards.
Mr Vincent Olive is from CNET the Research center of France-Telecom. The aim of this center is to study equipment and software tools to support new services and networks. CNET will discuss different styles of rapid prototyping in the development of innovative systems for telecommunications.
Haz Nabulsi is from Sierra Semiconductor which is a manufacturer of high performance, mixed signal integrated circuits systems for multimedia computing and personal communications systems. Sierra will discuss the application of rapid prototyping to several generations of products using Quickturn's emulation systems.
Jack Donovan from Tandem Computers will speak on emulating ASICs within Tandem fault tolerant computers.
This session presents three new directions for data path modeling and synthesis: symbolic modeling for design space exploration, integrating built-in self-test and estimating the impact of layout tools.
23.1 Symbolic Modeling and Evaluation of Data Paths*
Chuck Monahan, Forrest Brewer
We present an automata model which concisely captures the constraints imposed by a data-path, such as bus hazards, register constraints, and control encoding limitations. A set of uniform base components for depicting general data-paths and techniques for systematic translation of such depictions into Boolean functions are described. Finally, this model is expanded to represent the limitations of generating as well as moving operands by incorporating data-flow graphs. The benefits of this representation are demonstrated by modeling a commercial DSP microprocessor.
23.2 Data Path Allocation for Synthesizing RTL Designs with Low BIST Area
Overhead
Ishwar Parulkar, Sandeep Gupta, Melvin A. Breuer
Built-in self-test (BIST) techniques have evolved as cost-effective techniques for testing digital circuits. These techniques add test circuitry to the chip such that the chip has the capability to test itself. A prime concern in using BIST is the area overhead due to the modification of normal registers to be test registers. This paper presents data path allocation algorithms that 1) maximize the sharing of test registers resulting in a fewer number of registers being modified for BIST, and 2) minimize the number of CBILBO registers.
23.3 Deriving Efficient Area and Delay Estimates by Modeling Layout Tools
Donald S. Gelosh, Dorothy E. Setliff
This paper presents a novel approach to deriving area and delay estimates for high level synthesis using machine learning techniques to model layout tools. This approach captures the relationships between general design features (e.g., topology, connectivity, common input, and common output) and layout concepts (e.g., relative placement). Experimentation illustrates the effectiveness of this approach for a variety of real-world designs.
Practical application of formal verification requires the ability to provide feedback to the user and the ability to overcome capacity limitations of straightforward BDD-based methods. This session addresses these topics, presenting applications of learning techniques to ATPG-based verification, and discussing the generation of good counterexamples when verification uncovers a bug in a design.
24.1 Efficient OBDD-Based Boolean Manipulation in CAD beyond Current Limits
Jochen Bern, Christoph Meinel, Anna Slobodová
We present the concept of TBDD's which considerably enlarges the class of Boolean functions that can be efficiently manipulated in terms of OBDD's. It extends the idea of using domain transformations, which is well-known in many areas of mathematics, physics, and technical sciences, to the context of OBDD based Boolean function manipulation in CAD: Instead of working with the OBDD-representation of a function f, TBDD's allow working with an OBDD-representation of a suited cube transformed version of f. Besides of giving some theoretical insights into the new concept, we investigate in some detail cube transformations which are based on complete types. We
24.2 Novel Verification Framework Combining Structural and OBDD Methods in a Synthesis Environment
Subodh M. Reddy, Wolfgang Kunz, Dhiraj K. Pradhan
This paper presents a new methodology for formal logic verification for combinational circuits. Specifically, a structural approach is used, based on indirect implications derived by using Recursive Learning. This is extended to formulate a hybrid approach where this structural method is used to reduce the complexity of a subsequent functional method based on OBDDs. It is demonstrated how OBDD-based verification can take great advantage of structural preprocessing in a synthesis environment. The experimental results show the effective compromise achieved between memory efficient structural methods and functional methods. One more advantage of these methods lies in the fact that resources that go into logic synthesis can effectively be reused for verification purposes.
24.3 Advanced Verification Techniques Based on Learning
Jawahar Jain, Rajarshi Mukherjee, Masahiro Fujita
Design verification poses a very practical problem during circuit synthesis. Learning based verification techniques prove to be an attractive option for verifying two circuits with internal gates having simple functional relationships. We present a verification method which employs a learning technique based on symbolic manipulation and which can more efficiently learn indirect implications. The method can also learn some useful functional implications. We also present a framework in which an indirect implication technique is integrated with an OBDD based verification tool. We present highly efficient verification results on some ISCAS circuits as well as on some very hard industrial circuits.
24.4 Efficient Generation of Counterexamples and Witnesses in Symbolic Model Checking
E. M. Clarke, O. Grumberg, K. L. McMillan, X. Zhao
Model checking is an automatic technique for verifying sequential circuit designs and protocols. An efficient search procedure is used to determine whether or not the specification is satisfied. If it is not satisfied, our technique will produce a counterexample execution trace that shows the cause of the problem. We describe an efficient algorithm to produce counterexamples and witnesses for symbolic model checking algorithms. This algorithm is used in the SMV model checker and works quite well in practice. We also discuss how to extend our technique to more complicated specifications.
This session addresses a variety of CAD problems in analog circuit design. The first paper describes Darwin, an opamp synthesis system based on genetic optimization algorithms. The second paper describes efficient techniques for simulating the circuit impact of substrate noise. The third paper describes algorithms for automatic layout of mismatch sensitive circuits. Finally, the last paper describes a design-for-test approach for analog systems.
25.1 DARWIN: CMOS Opamp Synthesis by Means of a Genetic Algorithm*
Wim Kruiskamp, Domine Leenaerts
DARWIN is a tool that is able to synthesize CMOS opamps, on the basis of a genetic algorithm. A randomly generated initial set of opamps evolves to a set in which the topologies as well as the transistor sizes of the opamps are adapted to the required performance specifications. Several design examples illustrate the behavior of DARWIN.
25.2 Mixed-Signal Switching Noise Analysis Using Voronoi-Tessellated Substrate Macromodels
Ivan L. Wemple, Andrew T. Yang
We present a new modeling technique for analyzing the impact of substrate-coupled switching noise in CMOS mixed-signal circuits. Lumped element RC substrate macromodels are efficiently generated from layout using Voronoi tessellation. The models retain the accuracy of previously proposed models, but contain orders of magnitude fewer circuit nodes, and are suitable for analyzing large-scale circuits. The modeling strategy has been verified using detailed device simulation, and applied to some mixed-A/D circuit examples.
25.3 Direct Performance-Driven Placement of Mismatch-Sensitive Analog
Circuits
K. Lampaert, G. Gielen, W. Sansen
This paper presents a direct performance-driven placement algorithm for analog integrated circuits. The performance specifications directly drive the layout tools without intermediate parasitic constraints. A simulated-annealing algorithm is used to drive an initial solution to a placement that respects the circuit's performance specifications. During each iteration, the layout-induced performance degradation is calculated from the geometrical properties of the intermediate solution. The placement tool handles symmetry constraints, circuit loading effects and device mismatches. The feasibility of the approach is demonstrated with practical circuit examples.
25.4 System-Level Design for Test of Fully Differential Analog Circuits
Bapiraju Vinnakota, Ramesh Harjani, Nicholas J. Stessman
Analog IC test occupies a significant fraction of the design cycle. Testing costs are increased by the twin requirements of high precision and accuracy in signal measurement. We discuss a system level ACOB technique for fully differential analog ICs. Our test techniques incorporate analog specific constraints such as device matching, and circuit and switching noise. They have a minimal impact on performance, area and power. The techniques can be used for both discrete and continuous time circuits, over a wide frequency range. The system level DFT scheme is also used to design a self-testing switched capacitor filter. Our checking scheme provides significant fault coverage and is demonstrably superior to other DFT techniques for differential circuits.
The panelists, representing software, computer system, and EDA tool companies plus electronics designers, discuss the State of Operating Systems and the impact on electronics designers and the EDA industry.
A panel member from Microsoft will examine the different versions of Windows and what each provides. Also to be discussed is the state of the DOS operating system; the impact that Windows NT and Cairo will have on UNIX; what a true Windows application offers the designer, and finally, how the Windows, DOS and UNIX environments can mesh.
One panel member examines an approach that is already being implemented by several high-end vendors: supplying customers with design tools and more sophisticated services. This tools/services mix helps to compensate for software that cannot always measure up to the bleeding edge demands of silicon design. These vendors face the difficult task of structuring new business models to serve these customers in design environments with a multiplicity of platforms and operating systems, plus a lack of interoperability within EDA tools.
Another panelists perspective is that most of todays and tomorrows design groups exist in a mixed O.S. environment. As such, there won't be one O.S. that wins the war, but rather several that will prevail. Suppliers who best satisfy user needs will share the design workplace.
Another view states that over time, users will overwhelmingly select Windows desktop design because of its strong value proposition. This includes designers now working in a mixed environment. It may take them longer to switch to Windows because of their high investment in UNIX-based tools. But they will eventually retire legacy systems for the price/performance of desktop design, and because Microsoft’s Win32 operating system provides all the functionality needed for the vast majority of users.
Hardware-software co-design requires analysis, scheduling, and synthesis of embedded software. These three papers address execution time estimation, scheduling under I/O timing constraints, and synthesis of interface processes.
27.1 Performance Analysis of Embedded Software Using Implicit Path
Enumeration
Yau-Tsun Steven Li, Sharad Malik
Embedded computer systems are characterized by the presence of a processor running application specific software. A large number of these systems must satisfy real-time constraints. This paper examines the problem of determining the bound on the running time of a given program on a given processor. An important aspect of this problem is determining the extreme case program paths. The state of the art solution here relies on an explicit enumeration of program paths. This runs out of steam rather quickly since the number of feasible program paths is typically exponential in the size of the program. We present a solution for this problem, which considers all paths implicitly by using integer linear programming. This solution is implemented in the program cinderella 1 which currently targets a popular embedded processor — the Intel i960. The preliminary results of using this tool are presented here.
27.2 Interval Scheduling: Fine-Grained Code Scheduling for Embedded
Systems
Pai Chou, Gaetano Borriello
A central problem in embedded system co-synthesis is the generation of software for low-level I/O. Scheduling still remains a manual task because existing coarse-grained real-time scheduling algorithms are not applicable: they assume fixed delays even though the run times are often variable, and they incur too much overhead. To solve this problem, we present a new static ordering technique , called interval scheduling, for meeting general timing constraints on fine-grained, variable-delay operations without using a run-time executive.
27.3 Interfacing Incompatible Protocols Using Interface Process Generation
Sanjiv Narayan, Daniel D. Gajski
During system design, one or more portions of the system may be implemented with standard components that have a fixed pin structure and communication protocol. This paper described a new technique, interface process generation, for interfacing standard components that have incompatible protocols. Given an HDL description of the two protocols, we present a method to generate an interface process that allows the two protocols to communicate with each other.
This session presents three approaches to increasing the efficiency of circuit analysis. The first paper presents a method for efficient computation of reduced-order models for large linear circuits. The next paper presents a new matrix-free approach which is faster than iterative methods for large circuit analysis. The final paper proposes a mixed surface-volume approach for accurate and efficient simulation of 3-D interconnect.
28.1 Reduced-Order Modeling of Large Linear Subcircuits via a Block Lanczos Algorithm*
Peter Feldmann, R. W. Freund
A method for the efficient computation of accurate reduced-order models of large linear circuits is described. The method, called MPVL, employs a novel block Lanczos algorithm to compute matrix Pade approximations of matrix-valued network transfer functions. The reduced-order models, computed to the required level of accuracy, are used to speed up the analysis of circuits containing large linear blocks. The linear blocks are replaced by their reduced-order models, and the resulting smaller circuit can be analyzed with general-purpose simulators, with significant savings in simulation time and, practically, no loss of accuracy.
28.2 Efficient Steady-State Analysis Based on Matrix-Free Krylov-Subspace
Methods
Ricardo Telichevesky, Kenneth S. Kundert, Jacob K. White
Gaussian-elimination based shooting-Newton methods, a commonly used approach for computing steady-state solutions, grow in computational complexity like N 3 , where N is the number of circuit equations. Just using iterative methods to solve the shooting-Newton equations results in an algorithm which is still order N 2 because of the cost of calculating the dense sensitivity matrix. Below, a matrix-free Krylov-subspace approach is presented, and the method is shown to reduce shooting-Newton computational complexity to that of ordinary transient analysis. Results from several examples are given to demonstrate that the matrix-free approach is more than ten times faster than using iterative methods alone for circuits with as few as 400 equations.
28.3 Transient Simulations of Three-Dimensional Integrated Circuit Interconnect Using a Mixed Surface-Volume Approach
Mike Chou, Tom Korsmeyer, Jacob White
It has recently been shown that the boundary element method can be used to perform accurate cross-talk simulations of three-dimensional integrated circuit interconnect. However, the computational complexity grows as N 2 , where N is the number of surface unknowns. Straightforward application of the fast-multipole algorithm reduces the computational complexity to order N , but produces magnified errors due to the ill-conditioning of the steady-state problem. We present a mixed surface-volume approach and prove that the formulation results in the exact steady-state solution, independent of the multipole approximations. Numerical experiments are presented to demonstrate the accuracy and efficiency of this technique. On a realistic example, the new method runs fifteen times faster than using dense-matrix iterative methods.
This session brings together several key issues and recent techniques for optimization of clock and power distribution. The first paper proposes buffer optimization and location in the context of power minimization. The second paper also addresses low-power requirements by inserting buffers along interconnects. The third paper studies power distribution in light of functional information: this yields a formulation equivalent to bounded-skew clock distribution, which is heuristically solved in the concluding paper.
29.1 Buffer Insertion and Sizing under Process Variations for Low Power Clock
Distribution*
Joe G. Xi, Wayne W.-M. Dai
Power dissipated in clock distribution is a major source of total system power dissipation. Instead of increasing wire widths or lengths to reduce skew which results in increased power dissipation, we use a balanced buffer insertion scheme to partition a large clock tree into a number of small subtrees. Because asymmetric loads and wire width variations in small subtrees induce very small skew, minimal wire widths are used. This results in minimal wiring capacitance and dynamic power dissipation. Then the buffer sizing problem is formulated as a constrained optimization problem: minimize power subject to tolerable skew constraints. To minimize skew caused by device parameter variations from die to die, PMOS and NMOS devices in buffers are separately sized. Substantial power reduction is achieved while skews are kept at satisfiable values under all process conditions.
29.2 Power Optimal Buffered Clock Tree Design
Ashok Vittal, Malgorzata Marek-Sadowska
We propose a new problem formulation for low power clock network design that takes rise time constraints imposed by the design into account. We evaluate the utility of inserting buffers into the clock route for satisfying rise time constraints and for minimizing the area of the clock net. In particular, we show that the classical H-tree is sub-optimal in terms of both area and power dissipation when buffers may be inserted into the tree. We show that the power minimization problem is NP-hard and propose a greedy heuristic for power-optimal clock network design that utilizes the opportunities provided by buffer insertion. Our algorithm inserts buffers and designs the topology simultaneously. The results we obtain on benchmarks are significantly better than previous approaches in terms of power dissipation, wire length, rise times and buffer area. Power dissipation is typically reduced by a factor of two, rise times are four times better and buffer area requirements are an order of magnitude smaller.
29.3 Power Distribution Topology Design
Ashok Vittal, Malgorzata Marek-Sadowska
We propose topology design of power distribution nets using a novel method for capturing the temporal characteristics of sink currents - the current compatibility graph. This graph carries information necessary for net area optimization. We propose a new algorithm for simultaneous topology design and wire sizing that can handle large designs. Our techniques result in significant area improvements on benchmark instances.
29.4 On the Bounded-Skew Clock and Steiner Routing Problems
Dennis J.-H. Huang, Andrew B. Kahng, Chung-Wen Albert Tsao
We study the minimum-cost bounded-skew routing tree (BST) problem under the linear delay model. This problem captures several engineering tradeoffs in the design of routing topologies with controlled skew. We propose three tradeoff heuristics. (1) For a fixed topology Extended-DME (Ex-DME) extends the DME algorithm for exact zero-skew trees via the concept of a merging region.(2) For arbitrary topology and arbitrary embedding, Extended Greedy-DME (ExG-DME) very closely matches the best known heuristics for the zero-skewcase, and for the infinite-skewcase (i.e., the Steiner minimal tree problem). (3) For arbitrary topology and single-layer (planar) embedding, the Extended Planar-DME (ExP-DME) algorithm exactly matches the best known heuristic for zero-skewplanar routing, and closely approaches the best known performance for the infinite-skewcase. Our work provides unifications of the clock routing and Steiner tree heuristic literatures and gives smooth cost-skew tradeoff that enable good engineering solutions.
With ever-shortening design cycles, concurrency in different aspects of product development are of critical importance in maintaining competitiveness. The first paper describes the evolution and analysis of concurrent design methods applied to wearable computers. The next two papers describe the use of concurrency in the design of Asynchronous Transfer Mode (ATM) switches.
30.1 Benchmarking an Interdisciplinary Concurrent Design Methodology for
Electronic/Mechanical Systems*
Asim Smailagic, Daniel P. Siewiorek, Drew Anderson, Chris Kasabach, Tom Martin,
John Stivoric
The paper describes the evolution of an Interdisciplinary Concurrent Design Methodology (ICDM) and the metrics used to compare four generations of wearable computer artifacts produced by the methodology at each stage of ICDMs growth. The product cycle is defined, its phases, and the design information representation for each phase. Six generic axes of design activity are defined, and the concept of benchmarking a complete design methodology using these axes is introduced. In addition an approach for measuring design complexity is proposed. When applied to the four generations of the CMU wearable computers, the ICDM has demonstrated two orders of magnitude increase in design and efficiency.
30.2 A Methodology for HW-SW Codesign in ATM
Giovanni Mancini, Dave Yurach, Spiros Boucouris
This paper presents a methodology and strategy used for hardware-software codesigns and coverification of a large ATM switch whose functionality is largely embodied in new ASICs. A strategy based on software emulation is presented which supports the concurrent development and verification of the system software with the hardware being designed prior to lab samples being available. The goal is reduced system integration times and design iterations due to system errors.
30.3 Accelerating Concurrent Hardware Design with Behavioural Modelling and System Simulation
Allan Silburt, Ian Perryman, Janick Bergeron, Stacy Nichols, Mario Dufresne,
Greg Ward
This paper describes a functional hardware verification methodology for ASIC intensive products. It spans the ASIC, board, and system level, enabling simulation of the design concurrent with ASIC and board development. The simulation strategy relies on rapid development of behavioural models of ASICs to enable work to proceed in parallel and to achieve the necessary simulation efficiency. The results from a project on which the methodology was used are presented. The process provided early visibility of over 200 issues in the system of which 32 were critical to the successful conformance and timely completion of the project.
Electronic Systems Design Automation (ESDA) is a current cause celebre of the EDA industry. This panel first attempts to identify: What is the principal problem in ESDA? In fact, there are a number of emerging problems which are naturally considered to be within the domain of ESDA. Principally these include hardware-software co-design, representation of systems at higher levels of abstraction and links through synthesis to implementation paths in hardware and in software. Hardware-software co-design itself has a number of important sub-problems such as the co-synthesis and co-verification of systems of hardware and software. Representing systems at higher-levels of abstraction requires the ability to naturally express dataflow representations of sub-systems, control flow representations of sub-systems and modeling software executing on instruction-set processors.
Probably of even greater concern to the audience is the question: Which commercial vendor is best poised to address these problems? Among current comme rcial EDA vendors, iLogix has established a strong presence in modeling the control-flow portion of systems, but not the dataflow. Conversely, the SPW tool of the Alta Group, the COSSAP tool of Synopsys and the DSP-oriented toolset of Mentors European Design Center, have each shown successes in modelin g the dataflow portions of systems but not the control flow. Finally the Race and Reveal tools of Redwood, now a part of the Alta Group, seem best suited to modeling instruction-set processors. In short, each of the major commercial vendors have provided solutions to some portions of the ESDA-related problems but none has yet been able to provide a comprehensive solution.
The two problems of defining ESDA and determining commercial dominance will be discussed by the top engineering representatives of each of the major commercial players. In addition one of the best known researchers in the area, Prof. Jan Rabaey, will help to clearly identify the principal problems of ESDA. Gary Smith , one of the major analysts of the ESDA industry, will provide his own candid perspective on the strengths and weaknesses of the vendors entering this arena.
The verification of arithmetic circuits has proved very troublesome to BDD-based formal verification approaches. In this session, three new approaches to this fundamental problem demonstrate their promising results.
32.1 Verification of Arithmetic Circuits with Binary Moment Diagrams*
Randal E. Bryant, Yirng-An Chen
Binary Moment Diagrams (BMDs) provide a canoical representations for linear functions similar to the way Binary Decision Diagrams (BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables to integer values. BMDs can thus model the functionality of data path circuits operating over word-level data. Many important functions, including integer multiplication, that cannot be represented efficiently at the bit level with BDDs have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions with around the same complexity as BDDs. We propose a hierarchical approach to verifying arithmetic circuits, where component modules are first shown to implement their word-level specifications. The overall circuit functionality is then verified by composing the component functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits have been verified by this technique.
32.2Residue BDD and Its Application to the Verification of Arithmetic
Circuits
Shinji Kimura
The paper describes a verification method for arithmetic circuits based on residue arithmetic. In the verification, a residue module is attached to the specification and the implementation, and these outputs are compared by constructing BDD's. For the BDD construction without node explosion, we introduce a residue BDD whose width is less than or equal to a modulus. The method is useful for multipliers including C6288.
32.3 Equivalence Checking of Datapaths Based on Canonical Arithmetic
Expressions
Zheng Zhou, Wayne Burleson
Numerous formal verification systems have been proposed and developed for Finite Sate Machine based control units (notably SMV[19] as well as others). However, most research on the equivalence checking of datapaths is still confined to the bit-level. Formal verification of arithmetic expressions and synthesized datapaths, especially considering finite word-length computation, has not been addressed. Thus formal verification techniques have been prohibited from more extensive applications in numerical and Digital Signal Processing. In this paper a formal system, called Conditional Term Rewriting on Attitude Syntax Trees (ConTRAST) is developed and demonstrated for verifying the equivalence between two differently synthesized datapaths. This result arises from a sophisticated integration of attribute grammars, which provide expressive data structures fro syntactic and semantic information about designed datapaths, and term rewriting systems, which transform functionally equivalent datapaths into the same canonical form. The equivalence relation is defined as a congruence closure in the rewriting system, which can be generated from arbitrary axioms, such as associativity, communitivity, etc. in a certain algebraic system. Furthermore, the effect of finite word-lengths and their associated arithmetic precision are also considered in the definition of equivalence classes. As a particular application of ConTRAST, a formal verification system is designed to check equivalence under precision constraints. The results of initial DSP synthesis experiments are displayed, where two differently implemented IIR filters in direct II and cascaded architectures are automatically compared under given precision constraints.
Keywords - Design Verification, High-Level Synthesis and System-Level Design Aids.
Organizers: A. Kahng, G. Zimmermann
Routing methods for both intra- and inter-FPGAs are proposed. The first paper addresses the interconnection problem for a crossbar-connected multiple-FPGA system. The next two papers present performance-driven routing based on a rip up-and-reroute approach and a Steiner-tree construction, respectively. The fourth paper tries an interesting combination of different heuristics during different stages of routing. The session concludes with a presentation on important issues in FPGA routing from an industrial point of view.
33.1 On Optimal Board-Level Routing for FPGA-Based Logic Emulation
Wai-Kei Mak, D.F. Wong
In this paper, we consider a board-level routing problem which is applicable to FPGA-based logic emulation systems such as the Realizer system [3] and the Enterprise Emulation System [5] manufactured by Quickturn Systems. For the case where all nets are two-terminal nets, we present an O ( n 2)-time optimal algorithm where n is the number of nets. Our algorithm guarantees 100% routing completion if the number of interchip signal pins on each FPGA chip in the logic emulation system is less than or equal to the number of I/O pins on the chip. Our algorithm is based on iteratively finding Euler circuits in graphs. We also prove that the routing problem with multi-terminal nets is NP-complete.
33.2 A Performance and Routability Driven Router for FPGAs Considering Path
Delays
Yuh-Sheng Lee, Allen C.-H. Wu
This paper presents a new performance and routability driven router for symmetrical array based Field Programmable Gate Arrays (FPGAs). The objectives of our proposed routing algorithm are twofold: (1) improve the routability of the design (i.e., minimize the maximum required routing channel density) and (2) improve the overall performance of the design (i.e., minimize the overall path delay). Initially, nets are routed sequentially according to their criticalities and routabilities. The nets/paths violating the routing-resource and timing constraints are then resolved iteratively by a rip-up-and-rerouter, which is guided by a simulated evolution based optimization technique. The proposed algorithm considers the path delays and routability throughout the entire routing process. Experimental results show that our router can significantly improve routability and reduce delay over many existing routing algorithms.
33.3 New Performance-Driven FPGA Routing Algorithms
Michael J. Alexander, Gabriel Robins
Motivated by the goal of increasing the performance of FPGA-based designs, we propose effective Steiner and arborescence FPGA routing algorithms. Our graph-based Steiner tree constructions have provably-good performance bounds and outperform the best known ones in practice, while our arborescence heuristics produce routing solutions with optimal source-sink pathlengths at a reasonably low wirelength penalty. We have incorporated our algorithms into an actual FPGA router which routed a number of industrial circuits using channel widths considerably smaller than was previously possible.
33.4 Orthogonal Greedy Coupling - A New Optimization Approach to 2-D FPGA Routing
Yu-Liang Wu, Malgorzata Marek-Sadowska
We propose a novel optimization scheme that can improve the routing by reducing a newly observed router decaying effect. A pair of greedy-grow algorithms, each emphasizing a different optimization target are designed. By applying one algorithm first and then switching to the other when the first one approaches its decaying stage, the undesired effect can be significantly reduced and thus better results are produced. On the tested MCNC and industry benchmarks, in addition to our very low segment consumption the total number of tracks used by our scheme is 37% less than a published conventional maze router and 22% less than the best known 2-step global/detailed router [4,5]. Our results show that complicated multi-objective problems could be effectively attacked by coupling low complexity algorithms that traverse the solution space in orthogonal directions. This idea is applicable on both algorithmic and architectural optimization approaches [7].
33.5 Effects of FPGA Architecture on FPGA Routing
Stephen Trimberger
Although many traditional Mask Programmed Gate Array (MPGA) algorithms can be applied to FPGA routing, FPGA architectures impose critical constraints and provide alternative views of the routing problem that allow innovative new algorithms to be applied. This paper describes routing models provided by some commercial FPGA architectures, and points out the effects of these architectures on routing algorithms. Implicit in the discussion is a commentary on current and future research in FPGA routing.
This session examines the World Wide Web and how it may affect EDA. The first paper presents some initial work, examining data exchange, business transactions , and other topics. The panel will continue exploring uses of the Web focused on EDA with panelists from EDA companies, EDA customers, and universities.
34.1 The Case for Design Using the World Wide Web
Mário J. Silva, Randy H. Katz
Most information and services required today by designers will soon become available as documents distributed in a wide area hypermedia network. New integration services are required from the design environment, supporting business transactions with design information providers, automatic exchange of design data between independent groups, and integrated support for new forms of collaboration. We discuss design using electronic commerce and other services based on the Internet, and propose a hypermedia system organization for a new generation of CAD systems, conceived to make efficient use of that infrastructure. We also describe our experience as designers of an integrated design and documentation system that interfaces existing design and documentation tools with electronic commerce services based on the World Wide Web.
Panel: The Impact of the World Wide Web on Electronic Design and EDA
Three critical problems in the hardware/software co-design of embedded systems are the generation of code for embedded processors, the subsequent optimization of the embedded code, and the test and validation of the processor on which the code will execute. Recent work on each of these topics is presented in this session.
35.1 Synthesis of Software Programs for Embedded Control Applications
Massimiliano Chiodo, Paolo Giusto, Attila Jurecska, Luciano Lavagno, Harry
Hsieh, Kei Suzuki, Alberto L. Sangiovanni-Vincentelli, Ellen Sentovich
Software components for embedded reactive real-time applications must satisfy tight code size and run-time constraints. Finite State Machines provide a convenient format for embedded system co-synthesis, between high-level specification languages and software or hardware implementations. We propose a software generation methodology that takes advantage of the very restricted class of specifications and allows for tight control over the implementation cost. The methodology exploits several techniques from the domain of Boolean function optimization. We also describe how the simplified control/data-flow graph used as an intermediate representation can be used to accurately estimate the size and timing cost of the final executable code.
35.2 Conflict Modelling and Instruction Scheduling in Code Generation for In-House DSP Cores*
Adwin H. Timmer, Marino T.J. Strik, Jef L. van Meerbergen, Jochen A.G. Jess
Application domain specific DSP cores are becoming increasingly popular due to their advantageous trade–off between flexibility and cost. However, existing code generation methods are hampered by the combination of tight timing and resource constraints, imposed by the throughput requirements of DSP algorithms together with a fixed core architecture. In this paper, we present a method to model resource and instruction set conflicts uniformly and statically before scheduling. With the model we exploit the combination of all possible constraints, instead of being hampered by them. The approach results in an exact and run time efficient method to solve the instruction scheduling problem, which is illustrated by real life examples.
35.3 Code Optimization Techniques for Embedded DSP Microprocessors
Stan Liao, Srinivas Devadas, Kurt Keutzer, Steve Tjiang, Albert Wang
We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented.
Keywords - code generation, optimization, digital signal processors
35.4 Retargetable Self-Test Program Generation Using Constraint Logic Programming
Ulrich Bieker, Peter Marwedel
This paper presents new techniques in two different areas. Firstly, it proposes a solution to the problem of testing embedded processors. Towards this end, it discusses the automatic generation of executable test programs from a specification of test patterns for processor components. Secondly, the paper shows how constraint logic programming (CLP) improves the software production process for design automation tools. The advantages of CLP languages include: built-in symbolic variables and the built-in support for constraints over finite domains such as integers and Booleans.
The accurate and efficient estimation of switching activity in combinational and sequential circuits has become a cornerstone in low-power design methodologies. This session begins with a tutorial that highlights issues in switching activity estimation. This is followed by four papers that address various aspects of circuit activity computation in combinational and sequential circuits.
36.1 Tutorial: Feedback, Correlation, and Delay Concerns in the Power Estimation of VLSI Circuits
Farid N. Najm
With the advent of portable and high-density microelectronic devices, the power dissipation of integrated circuits has become a critical concern. Accurate and efficient power estimation during the design phase is required in order to meet the power specifications without a costly redesign process. As an introduction to the other papers in this session, this paper gives a tutorial presentation of the issues involved in power estimation.
36.2 Accurate Estimation of Combinational Circuit Activity
Huzefa Mehta, Manjit Borah, Robert Michael Owens, Mary Jane Irwin
Several techniques to estimate power consumption of a combinational circuit using probabilistic methods have been proposed. However none of these techniques take into account circuit activity when two or more inputs change simultaneously or when glitching occurs. A formulation is presented in this paper which includes signal correlation and multiple gate input switching. Work is also presented in estimating the glitching contribution to the switching activity. Results obtained from benchmarks and test circuits show very good accuracy when compared to actual activities as measured by SPICE and IRSIM.
36.3 Extreme Delay Sensitivity and the Worst-Case Switching Activity in VLSI Circuits
Farid N. Najm, Michael Y. Zhang
We observe that the switching activity at a circuit node, also called the transition density, can be extremely sensitive to the circuit internal delays. As a result, slight delay variations can lead to several orders of magnitude changes in the node activity. This has important implications for CAD in that, if the transition density is estimated by simulation, then minor inaccuracies in the timing models can lead to very large errors in the estimated activity. As a solution, we propose an efficient technique for estimating an upper bound on the transition density at every node. While it is not always very tight, the upper bound is robust, in the sense that it is valid irrespective of delay variations and modeling errors. We will describe the technique and present experimental results based on a prototype implementation.
36.4 Efficient Power Estimation for Highly Correlated Input Streams
Radu Marculescu, Diana Marculescu, Massoud Pedram
Power estimation in combinational modules is addressed from a probabilistic point of view. The zero-delay hypothesis is considered and under highly correlated input streams, the activities at the primary outputs and all internal nodes are estimated. For the first time, the relationship between logic and probabilistic domains is investigated and two new concepts - conditional independence and isotropy of signals - are brought into attention. Based on them , a sufficient condition for analyzing complex dependencies is given. In the most general case, the conditional independence problem has been shown to be NP-complete and thus appropriate heuristics are presented to estimate switching activity. Detailed experiments demonstrate the accuracy and efficiency of the method. The results reported here are useful in low power design.
36.5 Power Estimation in Sequential Circuits*
Farid N. Najm, Shashank Goel, Ibrahim N. Hajj
A new method for power estimation in sequential circuits is presented that is based on a statistical estimation technique. By applying randomly generated input sequences to the circuit, statistics on the latch outputs are collected, by simulation, that allow efficient power estimation for the whole design. An important advantage of this approach is that the desired accuracy can be specified up-front by the user; the algorithm iterates until the specified accuracy is achieved. This has been implemented and tested on a number of sequential circuits and found to be much faster than existing techniques. We can complete the analysis of a circuit with 1,452 flip-flops and 19,253 gates in about 4.6 hours (the largest test case reported previously has 223 flip-flops).
This session presents innovations in combinational logic synthesis. Innovative ideas in matrix covering provide a faster basic algorithm with general application. The paper on engineering change looks at modifying circuits with the fewest changes. Finally, a number of papers look at techniques for area and delay minimization.
37.1 New Ideas for Solving Covering Problems
Olivier Coudert, Jean Christophe Madre
Covering problems occur at several steps during logic synthesis including two-level minimization and DAG covering. This paper presents a better lower bound computation algorithm and two new pruning techniques that significantly improve the efficiency of covering problem solvers. We show that these techniques reduce by up to three orders of magnitude the time required to solve covering problems exactly.
37.2 Logic Synthesis for Engineering Change
Chih-chang Lin, Kuang-Chien Chen, Shih-Chieh Chang, Malgorzata Marek-Sadowska, Kwang-Ting Cheng
In the process of VLSI design, specifications are often changed. It is desirable that such changes will not lead to a very different design so that a large part of engineering effort can be preserved. We consider synthesis algorithms for handling such engineering changes. Given a synthesized network, our algorithm modifies it minimally to realize a new specification.
37.3 A Partitioning-Based Logic Optimization Method for Large Scale Circuits with Boolean Matrix
Yuichi Nakamura, Takeshi Yoshimura
This paper presents a new logic partitioning method for optimizing large scale circuits. The proposed method partitions a given circuit into transitive fanin-disjoint sub-circuits by matrix operations, so that various optimization methods can be applied to each partitioned sub-circuit instead of the whole circuit. Experimental results show that the proposed method achieves high-quality design comparable to the one optimized for the whole circuits, with much shorter time(1/20). Thus, the circuits with over 10,000 gates can be optimized by the proposed partitioning.
37.4 Multi-Level Logic Minimization Based on Multi-Signal Implications
Masayuki Yuguchi, Yuichi Nakamura, Kazutoshi Wakabayashi, Tomoyuki Fujita
This paper presents a novel method for logic minimization in large-scale multi-level networks. It accomplishes its great reductions on the basis of multi-signal implications and the relationships among these implications. Both are handled on a transitive implication graph, proposed in this paper, which realizes high-speed, high-quality minimization. This proposed method holds great promise for the achievement of an interactive logic design environment for large-scale networks.
37.5 An Efficient Algorithm for Local Don'’t Care Sets Calculation
Shih-Chieh Chang, Malgorzata Marek-Sadowska, Kwang-Ting Cheng
Local don't cares of an internal node expressed in terms of its immediate inputs are usually of interest. One can directly apply any two-level minimizer on the on-set and the local don't cares set to simplify an internal node. In this paper, we propose a memory efficient technique to calculate local don't cares of internal nodes in a combinational circuit. Our technique of calculating local don't cares makes use of automatic test pattern generation (ATPG) approach which allows us to identify quickly whether a cube in the local space is a don't care or not. Unlike other approaches which construct an intermediate form of don't cares in terms of the primary inputs, our technique directly computes the don't care cubes in the local space. This gives us a significant advantage over the previous approaches in memory usage. Experimental results on MCNC benchmarks are very encouraging.
37.6 Logic Clause Analysis for Delay Optimization
Bernhard Rohfleisch, Bernd Wurth, Kurt Antreich
In this paper, we present a novel method for topological delay optimization of combinational circuits. Unlike most previous techniques, optimization is performed after technology mapping. Therefore, exact gate delay information is known during optimization. Our method performs incremental network transformations, specifically substitutions of gate input or output signals by new gates. We present new theory which relates incremental network transformations to combinations of global clauses, and show how to detect such valid clause combinations. Employing techniques which originated in the test area, our method is capable to globally optimize large circuits. Comprehensive experimental results show that our method reduces the delay of large standard cell netlists by 23% on average. In contrast to most other delay optimization techniques, area reductions are achieved concurrently.
As we move to deep submicron CMOS processes we are scaling the metal linewidth but not the thickness. This is increasing the parasitic capacitance of the wires to the point where the typical interconnect delay is larger than a gate delay. At the moment synthesis tools work by minimizing the number of literals in Boolean logic expressions and mapping to gates in a cell library with the smallest databook delay values. Instead of concentrating on logic synthesis and pre-layout simulation and adding the wires, almost as an afterthought, we will need to think far more carefully about interconnect. One of the ways to do this is to couple the synthesis and physical layout steps far more tightly than they are now. It is not at all clear how best to do this.
Interconnect is one of the biggest, most visible, and perhaps most easily understood problems in deep submicron design, but there are others. In fact, there are many second- and third-order physical effects that are starting to become more important. In the past we have swept these problems under the carpet by using worst-case design. We can no longer afford the increasing number and increasing size of these pessimistic assumptions.
Some other examples of effects we will need to consider:
We will have to pay more attention to the shape of logic waveforms and not just
their delay. Many ASIC vendors have already started to account for the effect
of rise and fall times.
We will need to develop methods to tackle interconnect coupling. We cannot afford to extract all coupling capacitances.
With deep submicron processes we are moving into an era where we need low-power design, since batteries and packages don't seem to be keeping up. We need new tools and methods to help us in this area.
Lithography is becoming very difficult as we run out of room in the light spectrum. At the moment we put what we want on a mask and expect the lithography engineers to reproduce it on silicon. This may not continue to be possible.
This session examines the complexity of VHDL programs. The first paper, a tutorial, relates VHDL descriptions to logic circuits. The second and third papers introduce information-theoretic and software engineering metrics to VHDL programs.
39.1 Tutorial: Productivity Issues in High-Level Design: Are Tools Solving the Real Problems?
Reinaldo A. Bergamaschi
For the last decade, high-level design has been considered the breakthrough technology that will enable designs to be completed in a fraction of the time of current methodologies. This notion has been put forward by researchers in academia and industry. While this concept is perceived to be true, it is unclear how much time is actually saved in the total design cycle, including specification, synthesis, verification (simulation), layout and test. This paper analyzes the impact that a high-level design methodology can have in the total design cycle. The main question being raised here is how much high-level design tools really help. For example, one can use a high-level tool to perform scheduling or generate a pipelined implementation of a for loop; and that may save a week in design time. But that alone does not decrease the months of simulation time that still must be performed in large designs. This paper discusses these issues and presents possible approaches for making high-level design tools more effective in the total design cycle. The term high-level design tools, as used in this paper, denotes primarily synthesis and analysis tools capable of handling hardware description languages at the behavioral and register-transfer (RT) levels.
39.2 Information Models of VHDL
Cristian A. Giumale, Hilary J. Kahn
The paper discusses issues related to the application of information modelling to the field of Electronic CAD, using VHDL as the basis for discussion. It is shown that an information model of VHDL provides a coherent and uniform description of the VHDL objects at different levels of the language and of the transformations that interrelate these levels. In addition, it captures the time-dependent aspects of the language. Hence, a hierarchy of VHDL information models can exist which encompasses the range from abstraction to detail and can help support CAD applications in a direct manner.
39.3 Measures of Syntactic Complexity for Modeling Behavioral VHDL
Neal S. Stollon, John D. Provence
Complexity measures are potentially useful in developing modeling and reuse strategies and are recognized as being useful indicators of development cost and lifecycle metrics for systems design. In this paper, a syntactic measure complexity model for VHDL descriptions is investigated. The approach leverages similarities between VHDL models and software algorithms, where syntactic modeling has been previously applied. Aspects of the measure, including observed and estimated model length, volume, syntactic information, and abstraction level are defined and discussed. As a principle result, syntactic information modeling is related to Kolmogorov intrinsic complexity as a minimum design size implementation. Experimental data on VHDL modeling and complexity measurement is presented, with potential model comprehensibility and resource estimation applications.
This session covers innovations in timing analysis and optimization. The papers cover diverse issues such as gate and interconnect sizing for performance optimization, consideration of statistical issues in gate delay modeling, incremental timing analysis of latch-based systems and abstraction of clocks to speed up logic simulation.
40.1 Simultaneous Gate and Interconnect Sizing for Circuit-Level Delay Optimization
Noel Menezes, Satyamurthy Pullela, Lawrence T. Pileggi
With delays due to the physical interconnect dominating the overall logic path delays, circuit-level delay optimization must take interconnect effects into account. Instead of sizing only the gates along the critical paths for delay reduction, the trade-off possible by simultaneously sizing gate and interconnect must also be considered. We show that for optimal gate and interconnect sizing, it is imperative that the interaction between the driver and the RC interconnect load be taken into account. We present an iterative sensitivity-based approach to simultaneous gate and interconnect sizing in terms of a gate delay model which captures this interaction. During each iteration, the path delay sensitivities a re efficiently calculated and used to size the components along a path.
40.2 An Algorithm for Incremental Timing Analysis
Jin-fuw Lee, Donald T. Tang
In recent years, many new algorithms have been proposed for performing a complete timing analysis of sequential logic circuits. In this paper, we present an incremental timing analysis algorithm. When an incremental design change is made on the logic network, this algorithm will identify the portion of design for which the timing is affected, and quickly derive the new arrival times and slacks. A fast incremental timing analysis is desirable for users doing interactive logic design. It is particularly important for a logic synthesis program, which needs to evaluate the circuit delays under many logic modifications.
40.3 An Assigned Probability Technique to Derive Realistic Worst-Case Timing Models of Digital Standard Cells
Alessandro Dal Fabbro, Bruno Franzini, Luigi Croce, Carlo Guardiani
The possibility of determining the accurate worst-case timing performance of a library of standard cells is of great importance in a modern VLSI structured semicustom IC design flow. The margin for profitability is indeed extremely tight because of the ever increasing performance demand which can hardly be satisfied by a corresponding progress of the process technology. It is therefore of utmost importance to avoid excessively pessimistic estimates of the actual cell performance in order to exploit all the potential of the fabrication process. In this paper it is described a technique that allows to determine the worst-case points with an assigned probability value. It is thus possible to select the desired level of confidence for the worst-case evaluation of digital IC designs with good accuracy. The results of the Assigned Probability Technique (APT) are presented and compared with those obtained by standard methods both at cell and at circuit level showing the considerable benefits of the new method.
40.4 Automatic Clock Abstraction from Sequential Circuits
Samir Jain, Randal E. Bryant, Alok Jain
Our goal is to transform a low-level circuit design into a more abstract representation. A pre-existing tool, Tranalyze [4], takes a level circuit and generates a functionally equivalent gate-level representation. This work focuses on taking that gate-level sequential circuit and performing a temporal analysis which abstracts the clocks from the circuit. The analysis generates a cycle-level gate model with the detailed timing abstracted from the original circuit. Unlike other possible approaches, our analysis not require the user to identify state elements or give the timings internal state signals. The temporal analysis process has in simulation, formal verification, and reverse engineering existing circuits. Experimental results show a 40%-70% reduction in the size of the circuit and a 3X-150X speedup in simulation time.
This session reviews recent progress in techniques for making asynchronous synthesis feasible. The first paper presents techniques for making hierarchical asynchronous synthesis more flexible. The second paper describes a technique for making circuits more hazard resistant. The third paper overviews a practical system for validating the results of asynchronous synthesis.
41.1 Hierarchical Optimization of Asynchronous Circuits
Bill Lin, Gjalt De Jong, Tilman Kolks
—Many asynchronous designs are naturally specified and implemented hierarchically as an interconnection of separate asynchronous modules that operate concurrently and communicate with each other. This paper is concerned with the problem of synthesizing such hierarchically defined systems. When the individual components are synthesized and implemented separately, it is desirable to take into account the degrees of freedom that arise from the interactions with the other components and from the specification. Specifically, we consider how one can find the set of implementations that can be correctly substituted for a component in the system while preserving the behavior of the total system. The notion of correct substitution is formally defined for a hierarchical network of possibly non-deterministic modules and a new solution framework based on trace theory is presented to compute and represent this complete set of correct substitutions. We show that the complete set can be captured by a single trace structure using the notion of a “maximal trace structure We indicate how asynchronous synthesis methods may be applied to explore the solution space e.g. to generate a delay-insensitive implementation.
41.2 Externally Hazard-Free Implementations of Asynchronous Circuits
Milton Sawasaki, Chantal Ykman-Couvreur, Bill Lin
We present a new sum-of-product based asynchronous architecture, called the N-S HOT architecture, that operates correctly under internal hazardous responses and guarantees hazard-freeness at the observable non-input signals. We formally prove that within this architecture a very wide class of semi-modular state graphs with input choices (either distributive or non-distributive) that satisfy the complete state coding property always admit a correct implementation. As with synchronous circuits, we permit internal hazards in the combinational logic core, which means we can make use of conventional combinational logic minimization methods to produce the sum-of-product implementation. This represents a significant departure from most existing methods that require the combinational logic to be hazard-free and are mainly valid for distributive behaviors.
41.3 A Design and Validation System for Asynchronous Circuits
Peter Vanbekbergen, Albert Wang, Kurt Keutzer
In this paper we present a complete methodology for the design and validation of asynchronous circuits starting from a formal specification model that roughly corresponds to a timing diagram. The methodology is presented in such a way that it is easy to embed in the current methodology for synchronous circuits. The different steps of the synthesis process will just be briefly touched upon. The main part of the paper concentrates on the simulation and validation of asynchronous circuits. It discusses where the designer needs validation and how it can be done. It also explains how this process can be automated and embedded in the complete methodology.