|
ABSTRACT ICCAD'94
Moderators : Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA
Albert Wang, Synopsys, Inc., Mountain View, CA
1A.1 Perturb and Simplify: Multi-Level Boolean Network Optimizer
Shih-Chieh Chang, M. Marek-Sadowska, Univ. of California, Santa Barbara, CA
In this paper, we discuss the problem of optimizing a multi-level logic
combinational Boolean network. Our techniques apply a sequence of local
perturbations and modifications of the network which are guided by the
automatic test pattern generation ATPG based reasoning. In particular, we
propose several new ways in which one or more redundant gates or wires can be
added to a network. We show how to identify gates which are good candidates
for local functionality change. Furthermore, we discuss the problem of adding
and removing two wires, none of which alone is redundant, but when jointly added
/removed they do not affect functionality of the network. We also address the
problem of efficient redundancy computation which allows to eliminate many
unnecessary redundancy tests. We have performed experiments on MCNC benchmarks
and compared the results to those of misII[4] and RAMBO[6]. Experimental
results are very encouraging.
1A.2 Multi-Level Logic Optimization by Implication Analysis
Wolfgang Kunz, Univ. of Potsdam, Potsdam, Germany, Prem Menon, Univ. of Massachusetts, Amherst, MA
This paper proposes a new approach to multi-level logic optimization based on
ATPG (Automatic Test Pattern Generation). Previous ATPG-based methods for
logic minimization suffered from the limitation that they were quite restricted
in the set of possible circuit transformations. We show that the ATPG-based
method presented here allows (in principle) the transformation of a given
combinational network C into an arbitrary, structurally different
but functionally equivalent combinational network C’. Furthermore, powerful
heuristics are presented in order to decide what network manipulations are
promising for minimizing the circuit. By identifying indirect implications
between signals in the circuit, transformations can be derived which are good
candidates for the minimization of the circuit. In particular, it is shown that
Recursive Learning can derive g“ood Boolean divisors justifying the effort to
attempt a Boolean division. For 9 out of 10 ISCAS-85 benchmark circuits our
tool HANNIBAL obtains smaller circuits than the well-known synthesis system SIS.
1A.3 Incremental Synthesis
Daniel Brand, IBM Research Division, Yorktown Heights, NY, Anthony D. Drumm, IBM AS/400 Division, Rochester, MN
Sandip Kundu, IBM Research Division, Yorktown Heights, NY, Prakash Narain, IBM Microelectronics, Endicott, NY
A small change in the input to logic synthesis may cause a large change in the
output implementation. This is undesirable if a designer has some investment in
the old implementation and does not want it perturbed more than necessary. We
describe a method that solves this problem by reusing gates from the old
implementation, and restricting synthesis to the modified portions only.
Moderators: Don MacMillen, Synopsys, Inc., Mountain View, CA
Giovanni De Micheli, Stanford Univ., Stanford, CA
1B.1 Definition and Solution of the Memory Packing Problem for Field-Programmable Systems
David Karchmer, Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada
This paper defines a new optimization problem that arises in the use of a
Field-Programmable System (FPS). An FPS consists of a set of Field-Programmable Gate
Arrays and memories, and is used both for emulation of ASICs and computation.
In both cases the application circuits will include a set of memories which may
not match the number and aspect ratio of the physical memories available on the
FPS. This can often require that the physical memories be time-multiplexed to
implement the required memories, in a circuit we call a memory organizer.
We give a precise definition of the packing optimization problem and present an
algorithm for its solution. The algorithm has been implemented in a CAD tool
that automatically produces a memory organizer circuit ready for synthesis by
a commercial FPGA tool set.
1B.2 Integrating Program Transformations in the Memory-Based Synthesis of Image and Video Algorithms
David J. Kolson, Alexandru Nicolau, Nikil Dutt, Univ. of California, Irvine, CA
In this paper we discuss the interaction and integration of two important
program transformations in high-level synthesis - Tree Height Reduction and
Redundant Memory-access Elimination. Intuitively, these program transformations
do not interfere with one another as they optimize different operations in the
program graph and different resources in the synthesized system. However, we
demonstrate that integration of the two tasks is necessary to better utilize
available resources. Our approach involves the use of a "meta-transformation"
to guide transformation application as possibilities arise. Results observed on
several image and video benchmarks demonstrate that transformation integration
increases performance through better resource utilization.
1B.3 DataFlow-Driven Memory Allocation for Multi-Dimensional Signal Processing Systems
Florin Balasa, Francky Catthoor, Hugo De Man, IMEC, Leuven, Belgium
Memory cost is responsible for a large amount of the chip and/or board
area of customized video and image processing systems. In this paper, a novel
background memory allocation and assignment technique is presented. It is
intended for a behavioural algorithm specification, where the procedural
ordering of the memory related operations is not yet fully fixed. Instead of
the more restricted classical scheduling-based explorations, starting from
procedurally interpreted specifications in terms of loops, a novel optimization
approach - driven by data-flow analysis - is proposed. Employing the estimated
silicon area as a steering cost, this allocation/assignment technique yields
one or (optionally) several distributed (multi-port) memory architecture(s) with
fully-determined characteristics, complying with a given clock cycle budget for
read/write operations. Moreover, our approach can accurately deal with complex
multi-dimensional signals by means of a polyhedral data-flow analysis operating
with groups of scalars.
Moderators: Fadi Maamari, AT&T Bell Labs., Princeton, NJ
Chen-Shang Lin, National Taiwan Univ., Taipei, Taiwan, ROC
Bridge-type defects play a dominant role in state-of-the-art CMOS technologies.
This paper describes a combined functional and overcurrent-based test generation
approach for CMOS circuits, which is optionally based on layout information.
Comparative results for benchmark circuits are given to demonstrate the
feasibility of voltage-based versus IDDQ-based testing.
1C.1 Test Generation for Bridging Faults in CMOS ICs Based on Current Monitoring Versus Signal Propagation
Uwe Gläser, Theodor Vierhaus, Marco Kley, Anja Wiederhold, GMD, Sankt Augustin, Germany
Bridge-type defects play a dominant role in state-of-the-art CMOS technologies.
This paper describes a combined functional and overcurrent-based test generation
approach for CMOS circuits, which is optionally based on layout information.
Comparative results for benchmark circuits are given to demonstrate the
feasibility of voltage-based versus IDDQ-based testing.
1C.2 Iterative [Simulation Based Genetics + Deterministic Techniques] = Complete ATPG
Daniel G. Saab, Univ. of Illinois, Urbana, IL, Youssef G. Saab, Univ. of Missouri, Columbia, MO,
Jacob A. Abraham, Univ. of Texas, Austin, TX
Simulation-based test vector generators require much less computer time than
deterministic ATPG but they generate longer test sequences and sometimes achieve
lower fault coverage. This is due to the divergence in the search process. In
this paper, we propose a correction technique for simulation-based ATPG. This
technique is based on identifying the diverging state and on computing a fault
cluster (faults close to each other). A set of candidate faults from the
cluster is targeted with a deterministic ATPG and the resulting test sequence is
used to restart the search process of the simulation-based technique. This
above process is repeated until all faults are detected or proven to be
redundant/untestable. The program implementing this approach has been used to
generate tests with very high fault coverage, and runs about 10 times faster
than traditional deterministic techniques with very good test quality in terms
of test length and fault coverage.
1C.3 Analytical Fault Modeling and Static Test Generation for Analog ICs
Giri Devarayanadurg, Mani Soma, Univ. of Washington, Seattle, WA
Static tests are key in reducing the current high cost of testing analog and
mixed-signal ICs. A new DC test generation technique for detecting catastrophic
failures in this class of circuits is presented. To include the effect of
tolerance of parameters during testing, the test generation problem is
formulated as a minimax optimization problem, and solved iteratively as
successive linear programming problems. An analytical fault modeling technique,
based on manufacturing defect statistics is used to derive the fault list for
the test generation. Using the technique presented here an efficient static
test set for analog and mixed-signal ICs can be constructed, reducing both the
test time and the packaging cost.
Moderators: Ralph Otten, Delft Univ. of Tech., Delft, The Netherlands
Choon kyung Kim, Goldstar Electron Co., Ltd., Seoul, Korea
1D.1 Efficient Network Flow Based Min-Cut Balanced Partitioning
Honghua Yang, D.F. Wong, Univ. of Texas, Austin, TX
We consider the problem of bipartitioning a circuit into two balanced components
that minimizes the number of crossing nets. Previously, the Kernighan and Lin
type (K&L) heuristics, the simulated annealing approach, and the spectral method
were given to solve the problem. However, network flow techniques were
overlooked as a viable approach to min-cut balanced bipartition due to its high
complexity. In this paper we propose a balanced bipartition heuristic based on
repeated max-flow min-cut techniques, and give an efficient implementation that
has the same asymptotic time complexity as that of one max-flow computation. We
implemented our heuristic algorithm in a package called FBB. The experimental
results demonstrate that FBB outperforms the K&L heuristics and the spectral
method in terms of the number of crossing nets, and the efficient implementation
makes it possible to partition large circuit instances with reasonable runtime.
For example, the average elapsed time for bipartitioning a circuit S35932 of
almost 20K gates is less than 20 minutes.
1D.2 Multi-Way VLSI Circuit Partitioning Based on Dual Net Representation
Jason Cong, Wilburt Labio, Narayanan Shivakumar, Univ. of California, Los Angeles, CA
In this paper, we study the area-balanced multi-way partitioning problem of VLSI
circuits based on a new dual netlist representation named the hybrid dual
netlist (HDN). Given a netlist, we first compute a K-way partition of the nets
based on the HDN representation, and then transform a K-way net partition into a
K-way module partitioning solution. The main contribution of our work is the
formulation and solution of the K-way module contention (K-MC) problem, which
determines the best assignment of the modules in contention to partitions, while
maintaining user-specified area requirements, when we transform the net
partition into a module partition. Under a natural definition of binding factor
between nets and modules, and preference function between partitions and modules
, we show that the K-MC problem can be reduced to a min-cost max-flow problem.
We present efficient solutions to the K-MC problem based on network flow
computation. Extensive experimental results show that our algorithm consistently
outperforms the conventional K-FM partitioning algorithm by a significant
margin.
1D.3 A General Framework for Vertex Orderings, with Applications to Netlist Clustering
Charles J. Alpert, Andrew B. Kahng, Univ. of California, Los Angeles, CA
We present a general framework for the construction of vertex orderings for
netlist clustering. Our WINDOW algorithm constructs an ordering by iteratively
adding the vertex with highest attraction to the existing ordering. Variant
choices for the attraction function allow our framework to subsume many graph
traversals and clustering objectives from the literature. The DP-RP method of
[3] is then applied to optimally split the ordering into a k-way clustering.
Our approach is adaptable to user-specified cluster size constraints.
Experimental results for clustering and multi-way partitioning are encouraging.
Moderators: Ellen M. Sentovich, Univ. of California, Berkeley, CA
Luciano Lavagno, Politecnico di Torino, Torino, Italy
2A.1 Re-Encoding Sequential Circuits to Reduce Power Dissipation
Gary D. Hachtel, Mariano Hermida, Abelardo Pardo, Massimo Poncino, Fabio Somenzi, Univ. of Colorado, Boulder, CO
We present a fully implicit encoding algorithm for minimization of average
power dissipation in sequential circuits, based on the reduction of the average
number of bit changes per state transition.
We have studied two novel schemes for this purpose, one based on recursive
weighted non-bipartite matching, and one on recursive mincut bi-partitioning.
We employ ADDs (Algebraic Decision Diagrams) to computate the transition
probabilities, to measure potential area saving, and in the encoding algorithms
themselves.
Our experiments show the effectiveness of our method in reducing power
dissipation for large sequential designs.
2A.2 Precomputation-Based Sequential Logic Optimization for Low Power
Mazhar Alidina, José Monteiro, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA,
Abhijit Ghosh, Mitsubishi Electronics, Sunnyvale, CA, Marios Papaefthymiou, Yale Univ., New Haven, CT
We address the problem of optimizing logic-level sequential circuits for low
power. We present a powerful sequential logic optimization method that is based
on selectively precomputing the output logic values of the circuit one clock
cycle before they are required, and using the precomputed values to reduce
internal switching activity in the succeeding clock cycle. We present two
different precomputation architectures which exploit this observation.
We present an automatic method of synthesizing pre-computation logic so as to
achieve maximal reductions in power dissipation. We present experimental
results on various sequential circuits. Up to 75% reductions in average
switching activity and power dissipation are possible with marginal increases
in circuit area and delay.
2A.3 Low Power State Assignment Targeting Two-And Multi-Level Logic Implementations
Chi-ying Tsui, Massoud Pedram, Chih-ang Chen, Alvin Despain, Univ. of Southern California, Los Angeles, CA
The problem of minimizing power consumption during the state encoding of a
finite state machine is considered. A new power cost model for state encoding
is proposed and encoding techniques that minimize this power cost for two- and
multi-level logic implementations are described. These techniques are compared
with those which minimize area or the switching activity at the present state
bits. Experimental results show significant improvements.
Moderators: Wayne Wolf, Princeton Univ., Princeton, NJ
Gaetano Borriello, Univ. of Washington, Seattle, WA
2B.1 Algorithm Selection: A Quantitative Computation-Intensive Optimization Approach
Miodrag Potkonjak, NEC USA, C&C Research Labs., Princeton, NJ, Jan Rabaey, Univ. of California, Berkeley, CA
Given a set of specifications for a targeted application, algorithm selection
refers to choosing the most suitable algorithm for a given goal, among several
functionally equivalent algorithms. We demonstrate an extraordinary potential
of algorithm selection for achieving high throughput, low cost, and low power
implementations.
We introduce an efficient technique for low-bound evaluation of the throughput
and cost during algorithm selection and propose a relaxation-based heuristic for
throughput optimization. We also present an algorithm for cost optimization
using algorithm selection. The effectiveness of methodology and algorithms is
illustrated using examples.
2B.2 Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-Synthesis
Jörg Henkel, Rolf Ernst, Ulrich Holtmann, Thomas Benner, Technische Univ. Braunschweig, Braunschweig, Germany
Previously, we had presented the system COSYMA for hardware/software
co-synthesis of small embedded controllers [ErHeBe93]. Target system of COSYMA
is a core processor with application specific co–processors. The system
speedup for standard programs compared to a single 33MHz RISC processor solution
with fast, single cycle access RAM was typically less than 2 due to restrictions
in high-level co–processor synthesis, and incorrectly estimated back end tool
performance, such as hardware synthesis, compiler optimization and communication
optimization. Meanwhile, a high-level synthesis tool for high-performance
coprocessors in co-synthesis has been developed. This paper explains the
requirements and the main features of the high-level synthesis system and its
integration into COSYMA. The results show a speedup of 10 in most cases.
Compared to the speedup, the co–processor size is very small.
2B.3 Synthesis of Concurrent System Interface Modules with Automatic Protocol Conversion Generation
Bill Lin, Steven Vercauteren, IMEC, Leuven, Belgium
We describe a new high-level compiler called Integral for designing system
interface modules. The input is a high-level concurrent algorithmic
specification that can model complex concurrent control flow, logical and
arithmetic computations, abstract communication, and low-level behavior. For
abstract communication between two communicating modules that obey different
I/O protocols, the necessary protocol conversion behaviors are automatically
synthesized using a Petri net theoretic approach. We present a synthesis
trajectory that can synthesize the necessary hardware resources, control
circuitry, and protocol conversion behaviors for implementing system interface
modules.
Moderators: Robert C. Aitken, Hewlett-Packard Co., Palo Alto, CA
Sy-Yen Kuo, National Taiwan Univ., Taipei, Taiwan, TOC
2C.1 An Efficient Procedure for the Synthesis of Fast Self-Testable Controller Structures
Sybille Hellebrand, Hans-J. Wunderlich, Univ. of Siegen, Siegen, Germany
The BIST implementation of a conventionally synthesized controller in most
cases requires the integration of an additional register only for test purposes.
This leads to some serious drawbacks concerning the fault coverage, the system
speed and the area overhead. A synthesis technique is presented which uses the
additional test register also to implement the system function by supporting
self-testable pipeline-like controller structures. It will be shown, that if
the need of two different registers in the final structure is already taken into
account during synthesis, then the overall number of flipflops can be reduced,
and the fault coverage and system speed can be enhanced. The presented
algorithm constructs realizations of a given finite state machine specification
which can be trivially implemented by a self-testable structure. The efficiency
of the procedure is ensured by a very precise characterization of the space of
suitable realizations, which avoids the computational overhead of previously
published algorithms.
2C.2 Test Pattern Generation Based on Arithmetic Operations
Sanjay Gupta, Janusz Rajski, Jerzy Tyszer, McGill Univ., Montréal, Quebec, Canada
Existing built-in self test (BIST) strategies require the use of specialized
test pattern generation hardware which introduces significant area overhead and
performance degradation. In this paper, we propose a novel method for
implementing test pattern generators based on adders widely available in
data-path architectures and digital signal processing circuits. Test patterns
are generated by continuously accumulating a constant value and their quality
is evaluated in terms of the pseudo-exhaustive state coverage on subspaces of
contiguous bits. This new test generation scheme, along with the recently
introduced accumulator-based compaction scheme facilitates a BIST strategy for
high performance datapath architectures that uses the functionality of existing
hardware, is entirely integrated with the circuit under test, and results in
at-speed testing with no performance degradation and area overhead.
2C.3 Random Pattern Testable Logic Synthesis
Chen-Huan Chiang, Sandeep Gupta, Univ. of Southern California, Los Angeles, CA
Previous procedures for synthesis of testable logic guarantee that all faults in
the synthesized circuits are detectable. However, the detectability of many
faults in these circuits can be very low leading to poor random pattern
testability. A new procedure to perform logic synthesis that synthesizes random
pattern testable multilevel circuits is proposed. Experimental results show
that the circuits synthesized by the proposed procedure tstfx are significantly
more random pattern testable and smaller than those synthesized using its
counterpart fast extract (fx) in SIS. The proposed synthesis procedure designs
circuits that require only simple random pattern generators in built-in
self-test, thereby obviating the need for complex BIST circuitry.
Moderators: Choon Kyung Kim, Goldstar Electron Co. Ltd., Seoul, Korea
Jason Cong, Univ. of California, Los Angeles, CA
2D.1 Compression-Relaxation: A New Approach to Performance Driven Placement for Regular Architectures
Anmol Mathur, C. L. Liu, Univ. of Illinois, Urbana, IL
We present a new iterative algorithm for performance driven placement
applicable to regular architectures such as FPGAs. Our algorithm has two phases
in each iteration: a compression phase and a relaxation phase. We employ a novel
compression strategy based on the longest path tree of a cone for improving the
timing performance of a given placement. Compression might cause a feasible
placement to become infeasible. The concept of a slack neighborhood graph is
introduced and is used in the relaxation phase to transform an infeasible
placement to a feasible one using a mincost flow formulation. Our analytical
results regarding the bounds on delay increase during relaxation are validated
by the rapid convergence of our algorithm on benchmark circuits. We obtain
placements that have 13% less critical path delay (on the average) than those
generated by the Xilinx automatic place and route tool (apr) on technology
mapped MCNC benchmark circuits with significantly less CPU time than apr.
2D.2 A Loosely Coupled Parallel Algorithm for Standard Cell Placement
Wern-Jieh Sun, Carl Sechen, Univ. of Washington, Seattle, WA
We present a loosely coupled parallel algorithm for the placement of standard
cell integrated circuits. Our algorithm is a derivative of simulated annealing.
The implementation of our algorithm is targeted toward networks of UNIX
workstations. This is the very first reported parallel algorithm for standard
cell placement which yields as good or better placement results than its serial
version. In addition, it is the first parallel placement algorithm reported
which offers nearly linear speedup, in terms of the number of processors
(workstations) used, over the serial version. Despite using the rather slow
local area network as the only means of interprocessor communication, the
processor utilization is quite high, up to 98% for 2 processors and 90% for 6
processors. The new parallel algorithm has yielded the best overall results
ever reported for the set of MCNC standard cell benchmark circuits.
2D.3 Delay and Area Optimization for Compact Placement by Gate Resizing and Relocation
Weitong Chuang, AT&T Bell Labs., Murray Hill, NJ, Ibrahim N. Hajj, Univ. of Illinois, Urbana, IL
In this paper, we first present an efficient algorithm for the gate sizing
problem. Then we propose an algorithm which performs delay and area
optimization for a given compact placement by resizing and relocating cells in
the circuit lay-out. Since the gate sizing procedure is embedded within the
placement adjustment process, interconnect capacitance information is included
in the gate size selection process. As a result, the algorithm is able to
obtain superior solutions.
Moderators: Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada
Herve Touati, Digital Equipment Corp., Rueil-Malmaison, France
3A.1 Edge-Map: Optimal Performance Driven Technology Mapping for Iterative Lut Based FPGA Designs
Honghua Yang, D.F. Wong, Univ. of Texas, Austin, TX
We consider the problem of performance driven lookup-table (LUT) based
technology mapping for FPGAs using a general delay model. In the general
delay model, each interconnection edge has a weight representing the delay of
the interconnection. This model is particularly useful when combined with an
iterative re-technology mapping process where the actual delays of the placed
and routed circuit are fed-back to the technology mapping phase to improve the
mapping based on the more realistic delay estimation. Well known technology
mappers such as FlowMap and Chortle-d only minimize the number of levels in the
technology mapped circuit and hence are not suitable for such an iterative
re-technology mapping process. Recently, Mathur and Liu in [ML94] studied the
performance driven technology mapping problem using the general delay model and
presented an effective heuristic algorithm for the problem. In this paper, we
present an efficient technology mapping algorithm that achieves provably optimal
delay in the technology mapped circuit using the general delay model. Our
algorithm is a non-trivial generalization of FlowMap. A key problem in our
algorithm is to compute a K-feasible network cut such that the circuit delay on
every cut edge is upper-bounded by a specific value. We implemented our
algorithm in a LUT based FPGA technology mapping package called Edge-Map, and
tested Edge-Map on a set of benchmark circuits.
3A.2 Maple: A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for Field-Programmable Gate Arrays
Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki, Waseda Univ., Tokyo, Japan
Technology mapping algorithms for LUT (Look Up Table) based FPGAs have been
proposed to transfer a Boolean network into logic-blocks. However, since those
algorithms take no lay-out information into account, they do not always lead to
excellent results. In this paper, a simultaneous technology mapping, placement
and global routing algorithm for FPGAs, Maple, is presented. Maple is an
extended version of a simultaneous placement and global routing algorithm for
FPGAs, which is based on recursive partition of layout regions and block sets.
Maple inherits its basic process and executes the technology mapping
simultaneously in each recursive process. Therefore, the mapping can be done
with the placement and global routing information. Experimental results for
some benchmark circuits demonstrate its efficiency and effectiveness.
3A.3 Universal Logic Gate for FPGA Design
Chih-chang Lin, M. Marek-Sadowska, Duane Gatlin, Univ. of California, Santa Barbara, CA
In this paper the problem of selecting an appropriate programmable cell
structure for FPGA architecture design is addressed. The cells studied here can
be configured to the desired functionality by applying input permutation,
negation, bridging or constant assignment, or output negation. A general
methodology to determine logic description of such cells, which are capable of
being configured to a given set of functions is described.
Experimental results suggest that the new cell behaves as well as the Actel 2
cell in terms of logic power but requires substantially less area and wiring
over-head.
Moderators: Robert Walker, Rensselaer Polytechnic Inst., Troy, NY
Kazutoshi Wakabayashi, Stanford Univ., Stanford, CA
3B.1 Condition Graphs for High-Quality Behavioral Synthesis
Hsiao-Ping Juan, Viraphol Chaiyakul, Daniel D. Gajski, Univ. of California, Irvine, CA
Identifying mutual exclusiveness between operators during behavioral synthesis
is important in order to reduce the required number of control steps or hardware
resources. To improve the quality of the synthesis result, we propose a
representation, the Condition Graph, and an algorithm for identification of
mutually exclusive operators. Previous research efforts have concentrated on
identifying mutual exclusiveness by examining language constructs such as
IF-THEN-ELSE statements. Thus, their results heavily depend on the description
styles. The proposed approach can produce results independent of description
styles and identify more mutually exclusive operators than any previous
approaches. The Condition Graph and the proposed algorithm can be used in any
scheduling or binding algorithms. Experimental results on several benchmarks
have shown the efficiency of the proposed representation and algorithm.
3B.2 Dynamic Scheduling and Synchronization Synthesis of Concurrent Digital Systems under System-Level Constraints
Claudionor Nunes Coelho Jr., Giovanni De Micheli, Stanford Univ., Stanford, CA
We present in this paper a novel control synthesis technique for system-level
specifications that are better described as a set of concurrent synchronous
descriptions, their synchronizations and constraints. The proposed synthesis
technique considers the degrees of freedom introduced by the concurrent models
and by the environment in order to satisfy the design constraints.
Synthesis is divided in two phases. In the first phase, the original
specification is translated into an algebraic system, for which complex
control-flow constraints and quantifiers of the design are determined. This
algebraic system is then analyzed and the design space of the specification is
represented by a finite-state machine, from which a set of Boolean formulas is
generated and manipulated in order to obtain a solution. This method contrasts
with usual high-level synthesis methods in that it can handle arbitrarily
complex control-flow structures, concurrency and synchronization by allowing
the scheduling of the operations to change dynamically over time.
3B.3 Comprehensive Lower Bound Estimation from Behavioral Descriptions
Seong Y. Ohm, Fadi J. Kurdahi, Nikil Dutt, Univ. of California, Irvine, CA
In this paper, we present a comprehensive technique for lower bound estimation
(LBE) of resources from behavioral descriptions. Previous work has focused on
LBE techniques that use very simple cost models which primarily focus on the
functional unit resources. Our cost model accounts for storage resources
in addition to functional resources. Our timing model uses a finer granularity
that permits the modeling of functional unit, register and interconnect delays.
We tested our LBE technique for both functional unit and storage requirements
on several high-level synthesis benchmarks and observed near-optimal results.
Moderators: Andrew T. Yang, Univ. of Washington, Seattle, WA
Lawrence T. Pillage, Univ. of Texas, Austin., TX
3C.1 Fast and Accurate Timing Simulation with Regionwise Quadratic Models of MOS I-V Characteristics
A. Dharchoudhury, Sung Mo Kang, Univ. of Illinois, Urbana, IL
K.H. Kim, S.H. Lee, Samsung Electronics Co., Kyung-Ki-Do, Korea
This paper presents a technique called regionwise quadratic (RWQ) modeling that
allows highly accurate MOS models, as well as measured I-V data, to be used in
fast timing simulation. This technique significantly increases the accuracy of
fast timing simulation while maintaining efficiency by permitting analytical
solutions of node equations. A fast timing simulator using these RWQ models has
been implemented. Several examples of RWQ modeling are provided, and
comparisons of simulation results with SPICE3 are shown to demonstrate accuracy
and efficiency. Speedups of two to three orders of magnitude for circuits
containing up to 2000 transistors are observed.
3C.2 VLSI Timing Simulation with Selective Dynamic Regionization
Meng-Lin Yu, Bryan D. Ackland, AT&T Bell Labs., Holmdel, NJ
Accurate timing simulations are crucial to the design of MOS VLSI circuits, but
can take prohibitively large amounts of time. This paper describes dynamic
regionization techniques applied to an event based simulator for MOS timing
simulation that have proven to be more efficient and as accurate as the static
regionization method. The MOS network is first statically partitioned into
groups of strongly coupled nodes called regions. Big regions are then
incrementally and dynamically partitioned into and replaced by subregions.
Subregions are treated just like normal regions in the event based simulation
process. This simulator has been used to verify the timing and functionality of
several large VLSI chips. Performance is 3 to 7 times faster than a static
regionization method.
3C.3 A New Efficient Approach to Statistical Delay Modeling of CMOS Digital Combinational Circuits
S. A. Aftab, Motorola Strategis Systems Tech., Temple, AZ, M.A. Styblinski, Texas A&M Univ., College Station, TX
This paper presents one of the first attempts to statistically characterize
signal delays of basic CMOS digital combinational circuits using the transistor
level approach. Hybrid analytical/iterative delay expressions in terms of the
transistor geometries and technological process variations are created for basic
building blocks. Local delays of blocks along specific signal paths are
combined together for the analysis of complex combinational VLSI circuits. The
speed of analysis is increased by 2 to 4 orders of magnitude relative to SPICE,
with about 5-10% accuracy. The proposed approach shows good accuracy in
modeling the influence of the "noise" parameters on circuit delay relative to
direct SPICE-based Monte Carlo analysis. Examples of statistical delay
characterization are shown. The important impact of the proposed approach is
that statistical evaluation and optimization of delays in much larger VLSI
circuits will become possible.
Moderators: Michael Jackson, Motorola, Austin, TX
Ren-Song Tsay, ArcSys, Inc., Sunnyvale, CA
3D.1 Simultaneous Driver and Wire Sizing for Performance and Power Optimization
Jason Cong, Cheng-Kok Koh, Univ. of California, Los Angeles, CA
In this paper, we study the simultaneous driver and wiresizing(SDWS) problem
under two objective functions: (i) delay minimization only, or (ii) combined
delay and power dissipation minimization. We present general formulations of the
SDWS problem under these two objectives based on the distributed Elmore delay
model with consideration of both capacitive power dissipation and short-circuit
power dissipation. We show several interesting properties of the optimal SDWS
solutions under the two objectives, including an important result (Theorem 3)
which reveals the relationship between driver sizing and optimal wire sizing.
These results lead to polynomial time algorithms for computing the lower and
upper bounds of optimal SDWS solutions under the two objectives, and efficient
algorithms for computing optimal SDWS solutions under the two objectives. We
have implemented these algorithms and compared them with existing design methods
for driver sizing only or independent driver and wire sizing. Accurate SPICE
simulation shows that our methods reduce the delay by up to 11%-47% and power
dissipation by 26%-63% compared with existing design methods.
3D.2 Low-Cost Single Layer Clock Trees with Exact Zero Elmore Delay Skew
Andrew B. Kahng, C.-W Albert Tsao, Univ. of California, Los Angeles, CA
We give the first single-layer clock tree construction with exact zero skew
according to the Elmore delay model. The previous Linear-Planar-DME method [11]
guarantees a planar solution under the linear delay model. In this paper, we use
a Linear-Planar-DME variant connection topology to construct a low-cost zero
skew tree (ZST) according to the Elmore delay model. While a linear-delay ZST
is trivially converted to an Elmore-delay ZST by "detouring" wires, the key idea
is to defer this detouring as much as possible to reduce tree cost. Costs of
our planar ZST solutions are comparable to those of the best previous non-planar
ZST solutions, and substantially improve over previous planar clock routing
methods.
3D.3 Clock-Period Constrained Minimal Buffer Insertion in Clock Trees
Gustavo E. Téllez, Majid Sarrafzadeh, Northwestern Univ., Evanston, IL
In this paper we investigate the problem of computing a lower bound on the
number of buffers required when given a maximum clock frequency and a predefined
clock tree. Using generalized properties of published CMOS timing models, we
formulate a novel non-linear and a simplified linear buffer insertion problem.
We solve the later optimally with an O(n) algorithm. The basic formulation and
algorithm are extended to include a skew upper bound constraint. Using these
algorithms we propose further algorithmic extensions that allow area and phase
delay tradeoffs. Our results are verified using SPICE3e2 simulations with MCNC
MOSIS 2.0(u) models and parameters. Experiments show our buffer insertion
algorithms can be used effectively for high-speed clock designs.
Moderators: Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, Ma
Ellen M. Sentovich, Univ. of California, Berkeley, CA
4A.1 Efficient Implementation of Retiming
Narendra Shenoy, Richard Rudell, Synopsys, Inc., Mountain View, CA
Retiming is a technique for optimizing sequential circuits. It repositions the
registers in a circuit leaving the combinational cells untouched. The objective
of retiming is to find a circuit with the minimum number of registers for a
specified clock period. More than ten years have elapsed since Leiserson and
Saxe first presented a theoretical formulation to solve this problem for
single-clock edge-triggered sequential circuits. Their proposed algorithms have
polynomial complexity; however naive implementation s of these algorithms
exhibit O(n^3) time complexity and O(n^2) space complexity when applied to digital circuits with n combinational cells. This renders retiming ineffective for
circuits with more than 500 combinational cells. This paper addresses the
implementation issues required to exploit the sparsity of circuit graphs to
allow min-period retiming and constrained min-area retiming to be applied to
circuits with as many as 10,000 combinational cells. We believe this is the
first paper to address these issues and the first to report retiming results for
large circuits.
4A.2 Retiming with Non-Zero Clock Skew, Variable Register, and Interconnect Delay
Tolga Soyata, Eby G. Friedman, Univ. of Rochester, Rochester, NY
A retiming algorithm is presented which includes the effects of variable
register, clock distribution, and interconnect delay. These delay components
are incorporated into retiming by assigning Register Electrical Characteristics
(RECs) to each edge in the graph representation of the synchronous circuit. A
matrix (called the Sequential Adjacency Matric or SAM) is presented that
contains all path delays. Timing constraints for each data path are derived
from this matrix. Vertex lags are assigned ranges rather than single values as
in standard retiming algorithms. The approach used in the proposed algorithm is
to initialize these ranges with unbounded values and continuously tighten these
ranges using localized timing constraints until an optimal solution is obtained.
The algorithm is demonstrated on modified MCNC benchmark circuits and both
increased clock frequencies and elimination of all race conditions are observed.
4A.3 Optimal Latch Mapping and Retiming Within a Tree
Joel Grodstein, Eric Lehman, Heather Harkness, Bill Grundmann, Herve Touati, Digital Equipment Corp., Hudson, MA
We propose a technology mapping algorithm that takes existing structural
technology-mapping algorithms based on dynamic programming [1,3,4] and extends
them to retime pipelined circuits. If the circuit to be mapped has a tree
structure, our algorithm generates an optimal solution compatible with that
structure. The algorithm takes into account gate delays and capacitive loads
as latches are moved across the logic. It also supports latches with embedded
logic: i.e., cells that combine a D latch with a combinational gate at little
extra cost in latch delay.
Moderators: Steve Tjiang, Synopsys, Inc., Mountain View, CA
Mary Bailey, Univ. of Arizona, Tucson, AZ
4B.1 Simulation of Digital Circuits in the Presence of Uncertainty
Mark Linderman, Miriam Leeser, Cornell Univ., Ithaca, NY
Current extended value set dynamic timing analyzers are not sophisticated enough
to detect the subtle timing relationships upon which timing-critical systems
depend, and exhaustive simulation achieves very accurate results but at
tremendous computational cost. MTV is a simulator that strikes a balance between
accuracy and efficiency.
MTV is more accurate than other extended value set simulators because it
respects the ordering of events. It is more efficient than exhaustive simulators
because it efficiently simulates overlapping events and requires only a single
waveform to represent a signal. Features of MTV include: elimination of common
ambiguity, symbolic delays, correlated delays, and sophisticated algorithms to
detect ordered events. This paper concludes with simulation results from the
ISCAS85 benchmark suite.
4B.2 Fast Transient Power and Noise Estimation for CMOS VLSI Circuits
Wolfgang T. Eisenmann, Motorola GmbH, Munich, Germany
Helmut E. Graeb, Technical Univ. of Munich, Munich, Germany
Today's digital design systems are running out of steam, when it comes to
meeting the challenges presented by simultaneous switching, power consumption
and reliability constraints emerging in VLSI circuits. In this paper a new
technique to accurately estimate the transient behavior of large CMOS cell-based
circuits in a reasonable amount of time is presented. Gate-level simulation and
a consistent modeling methodology are employed to compute the time-domain
waveforms for signal voltages, supply currents, power consumption and (delta)I
noise on power lines. This can be done for circuit blocks and complete designs
by our new tool POWTIM, which adds SPICE-like capabilities to digital design
systems.
4B.3 The Inversion Algorithm for Digital Simulation
Peter Maurer, Univ. of South Florida, Tampa, FL
The Inversion Algorithm is an event-driven algorithm, whose performance rivals
or exceeds that of Levelized Compiled code simulation, even at activity rates of
50% or more. The Inversion Algorithm has several unique features, the most
remarkable of which is the size of the run-time code. The basic Algorithm can be
implemented using no more than a page of run-time code, although in practice it
is more efficient to provide several different variations of the basic
algorithm. The run-time code is independent of the circuit under test, so the
algorithm can be implemented either as a compiled code or an interpreted
simulator with little variation in performance. Because of the small size of the
run-time code, the run-time portions of the Inversion Algorithm can be
implemented in assembly language for peak efficiency, and still be retargeted
for new platforms with little effort.
Moderators: Sung Mo Kang, Univ. of Illinois, Urbana, IL
Don Scharfetter, Intel Corp., Santa Clara, CA
4C.1 Unified Complete Mosfet Model for Analysis of Digital and Analog Circuits
M. Miura-Mattausch, U. Feldmann, A. Rahm, M. Bollu, D. Savignac, Siemens AG, Munich, Germany
In this paper, we describe the complete MOSFET model developed for circuit
simulation. The model describes all transistor characteristics as functions of
surface potentials, which are calculated iteratively at each applied voltage
under the charge-sheet approximation. The key idea of this development is to put
as much physics as possible into the equations describing the surface potentials. Since the model includes both the drift and the diffusion contributions, a
single equation is valid from the subthreshold to the saturation regions. The
unified treatment of our model allows all transistor characteristics to be
calculated without any nonphysical fitting parameters. Additionally the
calculation time is drastically reduced in comparison with a conventional
piece-wise model.
4C.2 A Precorrected-FFT Method for Capacitance Extraction of Complicated 3-D Structures
J.R. Phillips, J. White, Massachusetts Inst. of Tech., Cambridge, MA
In this paper we present a new approach to three-dimensional capacitance
extraction based on a pre-corrected FFT scheme. The approach is compared to the
now commonly used multipole-accelerated algorithms for a variety of structures,
and the new method is shown to have substantial performance and memory
advantages.
4C.3 Measurement and Modeling of MOS Transistor Current Mismatch in Analog IC's
Eric Felt, Amit Narayan, Alberto-Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA
This paper presents a new methodology for measuring MOS transistor current
mismatch and a new transistor current mismatch model. The new methodology is
based on extracting the mismatch information from a fully functional circuit
rather than on probing individual devices; this extraction leads to more
efficient and more accurate mismatch measurement. The new model characterizes
the total mismatch as a sum of two components, one systematic and the other
random. For our process, we attribute nearly half of the mismatch to the
systematic component, which we model as a linear gradient across the die.
Furthermore, we present a new model for the random component of the mismatch
which is 60% more accurate, on average, than existing models.
Moderators: Andrew B. Kahng, Univ. of California, Los Angeles, CA
Gary Yeap, Motorola, Inc., Temple, AZ
4D.1 Skew Sensitivity Minimization of Buffered Clock Tree
Jae Chung, Chung-Kuan Cheng, Univ. of California, La Jolla, CA
Given a topology of clock tree and a library of buffers, we propose an efficient
skew sensitivity minimization algorithm using dynamic programming approach. Our
algorithm finds the optimum buffer sizes, its insertion levels in the clock
tree, and optimum wire widths to minimize the skew sensitivity under manufacturing
variations. Careful fine tuning by shifting buffer locations at the last stage
preserves the minimum skew sensitivity property and reduce the interconnect
length. For a given clock tree of n points and a library of s different buffer
sizes, the run time of the presented algorithm is O(log^3 n × s^2 ).
Experimental results show a significant reduction of clock skews ranging from
87 times to 144 times compared to the clock skews before applying the proposed
algorithm. We also observe a further reduction of the propagation delay of
clock signals as a result of applying the proposed skew sensitivity algorithm.
4D.2 Process-Variation-Tolerant Clock Skew Minimization
Shen Lin, C.K. Wong, IBM Corp., Yorktown Heights, NY
In this paper, we propose a novel hierarchical multiple-merge zero skew clock
routing algorithm. The routing results produced by our approach will have zero
skew in the nominal case and minimal skew increase in the presence of worst
process variations. In order to construct such a clock routing, we formulate
the linear placement with maximum spread problem and provide an O(n min{n,P}
logn logP ) algorithm for optimally solving this problem, where n is the number
of cells to be placed and P is the maximum spread. Experimental results show
that our algorithm can indeed reduce the skew in various manufacturing
variations effectively.
4D.3 A Specified Delay Accomplishing Clock Router Using Multiple Layers
Mitsuho Seki, Kenji Inoue, Hitachi, Ltd., Ibaraki-ken, Japan
Kazuo Kato, Kouki Tsurusaki, Shinichi Fukasawa, Hitoshi Sasaki, Mutsuhito Aizawa, Hitachi, Ltd., Tokyo, Japan
Clock routing to minimize the clock skew is very necessary to make high
performances LSIs. Our clock routing method: (1) realizes the specified delay
to each input terminal and provides a zero skew; (2)uses multiple routing layers
for pin-to-pin routing; and (3) considers the delay arising from the resistances
of a through-hole. Experimental results show that the delay is within 1% error
compared to the specified delay and the skew can be controlled within pico
second order.
Moderators: Sujit Dey, C&C Research Labs., NEC USA, Inc., Princeton, NJ
Michel Berkelaar, IBM Corp., Yorktown Heights, NY
5A.1 Switching Activity Analysis Considering Spatiotemporal Correlations
Radu Marculescu, Diana Marculescu, Massoud Pedram, Univ. of Southern California, Los Angeles, CA
This work presents techniques for computing the switching activities of all
circuit nodes under pseudorandom or biased input sequences and assuming a zero
delay mode of operation. Complex spatiotemporal correlations among the circuit
inputs and internal nodes are considered by using a lag-one Markov Chain model.
Evaluations of the model and a comparative analysis presented for benchmark
circuits demonstrates the accuracy and the practicality of the method. The
results presented in this paper are useful in power estimation and low power
design.
5A.2 Estimation of Circuit Activity Considering Signal Correlations and Simultaneous Switching
Tan-Li Chou, Kaushik Roy, Purdue Univ., West Lafayette, IN, Sharat Prasad, Texas Instrments Inc., Dallas, TX
This paper presents accurate estimation of signal activity at the internal nodes
of CMOS combinational logic circuits. The methodology is based on stochastic
model of logic signals and takes correlations and simultaneous switching of
signals at logic gate inputs into consideration. In combinational logic
synthesis, in order to minimize spurious transitions due to finite propagation
delays, it is crucial to balance all signal paths and to reduce the logic depth
[4]. As a result of balancing delays through different paths, the inputs to
logic gates may switch at approximately the same time. We have developed and
implemented an technique to calculate signal probability and switching activity
of the CMOS combinational logic circuits. Experimental results show that if
simultaneous switching is not considered the switching activities of the internal
nodes can be off by more than 100% compared to simulation based techniques. In
contrast, our technique is on the average within 2% of logic simulation results.
5A.3 A Cell-Based Power Estimation in CMOS Combinational Circuits
Jiing-Yuan Lin, T.C. Liu, Wen-Zen Shen, National Chiao Tung Univ., Hsin-chu, Taiwan, ROC
In this paper we present a power dissipation model considering the charging/discharging of capacitance at the gate output node as well as internal nodes,
and capacitance feedthrough effect. Based on the model, a Cell-Based Power
Estimation (CBPE) method is developed to estimate the power dissipation in CMOS
combinational circuits. In our technique, we first construct a modified state
transition graph called STGPE to model the power consumption behavior of a logic
gate. Then, according to the input signal probabilities and transition densities
of the logic gate, we perform an efficient method to estimate the expected
activity number of each edge in the STGPE. Finally, the energy consumption of
a logic gate is calculated by summing the energy consumptions of each edge in
STGPE. For a set of benchmark circuits, experimental results show that the power
dissipation estimated by CBPE is on average within 10-percent errors as compared
to the exact SPICE simulation while the CPU time is more than two
order-of-magnitudes faster.
Moderators: Reinaldo Bergamaschi, IBM Corp., Yorktown Heights, NY
Yukihiro Nakamura, NTT Knowledge Systems Lab., Yokosuka-Shi, Japan
5B.1 Design Exploration for High-Performance Pipelines
Smita Bakshi, Daniel D. Gajski, Univ. of California, Irvine, CA
Exploration plays an important role in the design of high-performance pipelines.
We propose an exploration strategy for varying three design parameters by using
a performance-constrained component selection and pipelining algorithm on
different "architectures". The architecture is specified manually by using a
mix of behavioral and structural constructs, while the component selection and
pipelining is performed automatically using our algorithms. Results on two
industrial-strength DSP systems, indicate the effectiveness of our strategy in
exploring a large design space within a matter of seconds.
5B.2 Simultaneous Functional-Unit Binding and Floorplanning
Yung-ming Fang, D.F. Wong, Univ. of Texas, Austin, TX
As device feature size decreases, interconnection delay becomes the dominating
factor of system performance. Thus it is important that accurate physical
information is used during high level synthesis. In this paper, we consider the
problem of simultaneously performing functional-unit binding and floorplanning.
Experimental results indicate that our approach to combine binding and
floorplanning is superior to the traditional approach of separating the two
tasks.
5B.3 Module Selection and Data Format Conversion for Cost-Optimal DSP Synthesis
Kazuhito Ito, Lori E. Lucke, Keshab K. Parhi, Univ. of Minnesota, Minneapolis, MN
In high level synthesis each node of a synchronous data-flow graph (DFG) is
scheduled to a specific time and allocated to a processor. In this paper we
present new integer linear programming (ILP) models which generate a blocked
schedule for a DFG with implicit retiming, pipelining, and unfolding while
performing module selection and data format conversion. A blocked schedule is a
schedule which overlaps multiple iterations of the DFG to guarantee a minimum
number of processors. Component modules are selected from a library of
processors to minimize cost. Furthermore, we include data format converters
between processors of different data formats. In addition, we minimize the
unfolding factor of the blocked schedule.
Moderators: Daniel G. Saab, Univ. of Illinois, Urbana, IL
Sandip Kundu, IBM Corp., Yorktown Heights, NY
5C.1 On Testing Delay Faults in Macro-Based Combinational Circuits
Irith Pomeranz, Sudhakar Reddy, Univ. of Iowa, Iowa City, IA
We consider the problem of testing for delay faults in macro-based circuits.
Macro-based circuits are obtained as a result of technology mapping. Gate-level
fault models cannot be used for such circuits, since the implementation of a
macro may not have an accurate gate-level counterpart, or the macro
implementation may not be known. Two delay fault models are proposed for
macro-based circuits. The first model is analogous to the gate-level gross
delay fault model. The second model is analogous to the gate-level path delay
fault model. We provide fault simulation procedures, and present experimental
results.
5C.2 RAFT: A Novel Program for Rapid-Fire Test and Diagnosis of Digital Logic for Marginal Delays and Delay Faults
Abhijit Chatterjee, Georgia Inst. of Tech., Atlanta, GA, Jacob A. Abraham, Univ. of Texas, Austin, TX
The problem of delay fault-testing and detection of chips with marginal
performance has become even more critical than before due to advancing clock
speeds. In this paper methodology for detection of marginal digital circuits
and diagnosis of gate delay failures is developed. A new test application
methodology is proposed in which test vectors may be applied to digital
combinational circuits at intervals smaller than the critical path delay of the
circuit and signal waveform analysis is used to interpret the test results. The
resulting test are called RApid Fire Tests (for RAFT) and allow classification
of circuits from "good" to "bad" along a continuous scale.
5C.3 A Comprehensive Fault Macromodel for OPAMPS
Chen-Yang Pan, Kwang-Ting Cheng, Univ. of California, Santa Barbara, CA
Sandeep Gupta, Univ. of Southern California, Los Angeles, CA
In this paper, a comprehensive macromodel for transistor level faults in an
operational amplifier is developed. With the observation that faulty behavior
at output may result from interfacing error in addition to the faulty component,
parameters associated with input and output characteristics are incorporated.
Test generation and fault classification are addressed for stand-alone opamps.
A high fault coverage is achieved by a proposed testing strategy. Transistor
level short/bridging faults are analyzed and classified into catastrophic faults
and parametric faults. Based on the macro-models for parametric faults, fault
simulation is performed for an active filter. We found many parametric faults
in the active filter cannot be detected by traditional functional testing. A
DFT scheme along with a current testing strategy to improve fault coverage is
proposed.
Moderators: Bryan Preas, Xerox Parc, Palo Alto, CA
Dwight D. Hill, Synopsys, Inc., Mountain View, CA
5D.1 A Channel-Driven Global Routing with Consistent Placement
Shigetoshi Nakatake, Yoji Kajitani, JAIST, Ishikawa, Japan
A global router with its consistent placer is proposed which aims to control
wire-densities of channels. The routing order of nets and their routes are
decided according to the channel which is predicted to have the maximum
wire-density. The placer distributes the nets evenly with respect to the
virtual length (half perimeter of the bounding box). Interesting features
included are the interactive dynamic test to decide the form of predicting
functions and the admissible region to consider the routing resources in
placement stage. Experiments reveal some interesting phenomena that smaller
maximum wire-density is attained in spite of comparable total wire-density and
that smaller maximum wire-length in spite of larger total wire-length.
5D.2 A New Global Routing Algorithm for FPGAs
Yao-Wen Chang, S. Thakur, K. Zhu, D.F. Wong, Univ. of Texas, Austin, TX
As in traditional ASIC technologies, FPGA routing usually consists of two steps:
global routing and detailed routing. Unlike existing FPGA detailed routers,
which can take full advantage of the special structures of the programmable
routing resources, FPGA global routing algorithms still greatly resemble their
counter-parts in the traditional ASIC technologies. In particular, the routing
congestion information of a switch block essentially is still measured by the
numbers of block essentially is still measured by the numbers of available rows
and columns in the switch block. Since the internal architecture of a switch
block decides what can route through the block, the traditional measure of
routing capacity is no longer accurate. In this paper, we present an accurate
measure of switch block routing capacity. Our new measure considers the exact
positions of the switches inside a switch block. Experiments with a global
router based on these ideas show an average improvement of 38% in the channel
width required to route some benchmark circuits using a popular switch block,
compared with an algorithm based on the traditional methods for congestion
control.
5D.3 On NP-Completeness of Regular 2-D FPGA Routing Architectures and a Novel Solution
Yu-Liang Wu, Douglas Chang, Univ. of California, Santa Barbara, CA
Several industrial FPGA routing architectures have been shown to have no
efficient routing algorithms (unless P=NP) [3,4]. Here, we further investigate
if the intractability of the routing problem on a regular 2-D FPGA routing
architecture can be alleviated by adding routing switches. We show that on this
routing architecture, even with a substantial increase in switching flexibility,
a polynomial time, predictable routing algorithm is still not likely to exist,
and there is no constant ratio bound of the detailed over global routing channel
densities.
We also show that a perfect routing is unachievable on this architecture even
with near complete (maximum) switching flexibility. We also discuss a new,
greedy routing architecture, that possesses predictable and other desired
routing properties, yet requires fewer routing resources than regular
architectures. This theoretical result may suggest an alternative approach in
routing architecture designs.
Moderators: Michel Berkelaar, IBM Corp., Yorktown Heights, NY
Jonathan Rose, Univ. of Toronto, Toronto, Ontario, Canada
6A.1 A Symbolic Method to Reduce Power Consumption of Circuits Containing False Paths
R. Iris Bahar, Gary D. Hachtel, Enrico Macii, Fabio Somenzi, Univ. of Colorado, Boulder, CO
Power dissipation in technology mapped circuits can be reduced by performing
gate re-sizing. Recently we have proposed a symbolic procedure which exploits
the compactness of the ADD data structure to accurately calculate the arrival
times at each node of a circuit for any primary input vector. In this paper we
extend our timing analysis tool to the symbolic calculation of required times
and slacks, and we use this information to identify gates of the circuit that
can be re-sized. The nice feature of our approach is that it takes into account
the presence of false paths naturally. As shown by the experimental results,
circuits re-synthesized with the technique we present in this paper are
guaranteed to be at least as fast as the original implementations, but smaller
and substantially less power-consuming.
6A.2 Multi-Level Network Optimization Targeting Low Power
Sasan Iman, Massoud Pedram , Univ. of Southern California, Los Angeles, CA
This paper describes a procedure for minimizing the power consumption in a
boolean network under the zero delay model. Power is minimized by modifying the
function of each intermediate node in the network such that the power
consumption of the node is decreased without increasing the power consumption of
the other nodes in the network. A formal analysis of how changes in the
switching activity of an intermediate node affect the switching activity of
other nodes in the network is given first. Using this analysis, a procedure for
calculating the set of compatible power don'’t cares for each node in the
network is presented. Finally it is shown how these don'’t cares are used to
optimize the network for low power. These techniques have been implemented and
results show an average of 10% improvement in total power consumption of the
network compared to the results generated by the conventional network
optimization techniques.
6A.3 LP Based Cell Selection with Constraints of Timing, Area and Power Consumption
Yutaka Tamiya, Yusuke Matsunaga, Fujitsu Labs. Ltd., Kawasaki, Japan
Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA
This paper presents a new LP based optimal cell selection method. Optimal cell
selection is useful tool for final tuning of LSI designs. It replaces
drivabilities of cells, adjusting timing, area, and power constraints. Using
the latest and earliest arrival times, it can handle both setup and hold time
constraints. We also make an efficient initial basis, which speeds up a simplex
LP solver by 5 times without any relaxations nor approximations. From
experimental results, it reduces the clock cycle of a manual designed
13k-transistor chip by 17% without any increase of area.
Moderators: Gaetano Boriello, Univ. of Washington, Seattle, WA
Wayne Wolf, Princeton Univ., Princeton, NJ
6B.1 Power Analysis of Embedded Software: A First Step Towards Software Power Minimization
Vivek Tiwari, Sharad Malik, Andrew Wolfe, Princeton Univ., Princeton, NJ
Embedded computer systems are characterized by the presence of a dedicated
processor and the software that runs on it. Power constraints are increasingly
becoming the critical component of the design specification of these systems. At
present, however, power analysis tools can only be applied at the lower levels
of the design - the circuit or gate level. It is either impractical or
impossible to use the lower level tools to estimate the power cost of the
software component of the system. This paper describes the first systematic
attempt to model this power cost. A power analysis technique is developed that
has been applied to two commercial microprocessors - Intel 486DX2 and Fujitsu
SPARClite 934. This technique can be employed to evaluate the power cost of
embedded software and also be used to search the design space in software power
optimization.
6B.2 Generating Instruction Sets and Microarchitectures from Applications
Ing-Jer Huang, Alvin Despain, Univ. of Southern California, Los Angeles, CA
The design of application-specific instruction set processor (ASIP) system
includes at least three interdependent tasks: microarchitecture design,
instruction set design, and instruction set mapping for the application. We
present a method that unifies these three design problems with a single
formulation: a modified scheduling/allocation problem with an integrated
instruction formation process. Micro-operations (MOPs) representing the
application are scheduled into time steps. Instructions are formed and hardware
resources are allocated during the scheduling process. The assembly code for the
given application is obtained automatically at the end of the scheduling
process. This approach considers MOP parallelism, instruction field encoding,
delay load/store/ branch, conditional execution of MOPs and the retargetability
to various architecture templates. Experiments are presented to show the power
and limitation of our approach. Performance improvement over our previous work
[4] is significant.
6B.3 Register Assignment Through Resource Classification for ASIP Microcode Generation
Clifford Liem, Trevor May, Pierre Paulin, Bell-Northern Research Inc., Ottawa, Ontario, Canada
Application Specific Instruction-Set Processors (ASIPs) offer designers the
ability for high-speed data and control processing with the added flexibility
needed for late design specifications, accomodation of design errors, and
product evolution. However , code generation for ASIPs is a complex problem and
new techniques are needed for its success. The register assignment task can be
a critical phase, since often in ASIPs, the number and functionality of
available registers is limited, as the designer has opted for simplicity, speed,
and low area. Intelligent use of register files is critical to the program
execution time, program memory usage and data memory usage. This paper describes
a methodology utilizing register classes as a basis for assignment for a
particular style of ASIP architectures. The approach gives preference to
special purpose registers which are the scarce resources. This naturally leads
to the objectives of high speed and low program memory usage. The approach has
been implemented in a system called CodeSyn [1] and used on custom ASIP
architectures.
Moderators: Albert Ruehli, IBM Corp., Yorktown Heights, NY
Sung Mo Kang, Univ. of Illinois, Urbana, IL
6C.1 Efficient Small-Signal Circuit Analysis and Sensitivity Computations with the PVL Algorithm
Roland W. Freund, Peter Feldmann, AT&T Bell Labs., Murray Hill, NJ
We describe the application of the PVL algorithm to the small-signal analysis of
circuits, including sensitivity computations. The PVL algorithm is based on the
efficient computation of the Pade approximation of the network transfer function
via the Lanczos process. The numerical stability of the algorithm permits the
accurate computation of the Pade approximation over any given frequency range.
We extend the algorithm to compute sensitivities of network transfer functions,
their poles, and zeros, with respect to arbitrary circuit parameters, with
minimal additional computational cost, and we present numerical examples.
6C.2 Capturing Time-Of-Flight Delay for Transient Analysis Based on Scattering Parameter Macromodel
Haifang Liao, Wayne W.M. Dai, Univ. of California, Santa Cruz, CA
The delay associated with transmission line networks consists of the
exponentially charging time and a pure propagation delay. This propagation delay
, so called time-of-flight delay, is particularly evident in long lines. When
the time-of-flight is comparable to the input rise-time, it is difficult to
capture the time-of-flight with a finite sum of exponentials. Therefore the
time-of-flight must be captured explicitly from the transfer function of the
circuit. In this paper, we give a precise definition of the time-of-flight
together with some basic properties, and present an efficient method to capture
the time-of-flight for general interconnect networks. Based on our scattering
parameter macromodel, we can easily capture the time-of-flight during the
network reduction while using lower order model to evaluate the charging delay.
By capturing the time-of-flight delay, the accuracy of system responses can be
greatly improved without significantly increasing computing time.
6C.3 RC Interconnect Synthesis - A Moment Fitting Approach
Noel Menezes, Satyamurthy Pullela, Florentin Dartu, Lawrence T. Pillage, Univ. of Texas, Austin, TX
Presently, delays due to the physical interconnect between logic gates account
for large portions of the overall path delays. For this reason, synthesis of
the logic gate fanout structure is of paramount importance during performance
optimization. This paper presents a methodology for on-chip RC interconnect
synthesis. Moment sensitivities are used to vary the wire widths of the
branches in an RC interconnect tree to achieve performance targets. In this
paper, signal slopes and delays at critical fanout nodes are the targets, and
the impact on total metal area is considered. An procedure for computing the
exact moment sensitivities in an RC tree is described.
Moderators: Takashi Kambe, SHARP Co., Nara, Japan
Ralph Otten, Delft Univ. of Tech., Delft, The Netherlands
6D.1 Adaptive Cut Line Selection in Min-Cut Placement for Large Scale Sea-of-Gates Arrays
Kazuhiro Takahashi, Mitsubishi Electric Corp., Itami Hyogo, Japan, Kazuo Nakajima, Univ. of Maryland, College Park, MD
Masayuki Terai, Koji Sato, Mitsubishi Electric Corp., Itami Hyogo, Japan
We present a new min-cut based placement algorithm for large scale sea-of-gates
arrays. In the past all such algorithms used a fixed cut line sequence that is
determined before min-cut partitioning is performed. In our approach, we
adaptively select a next partitioning pattern based on the current parameter
value; we then perform the corresponding min-cut partitionings and measure a new
parameter value. We repeat this process until all cut lines are processed. As a
parameter, we introduce a new global objective function based on wire
congestions on cut lines. We establish a close relation between this function
and cut line sequences. This relation is used to develop an innovative method of
adaptively determining a cut line sequence so as to minimize this global
function. With this adaptive selection of cut lines along with a new
cluster-based min-cut partitioning technique, our algorithm can produce, in a
short time and at a low cost, final placement results that achieve the 100%
completion of wiring on chips of fixed sizes. This has led to its successful
production use, having generated more than 400 CMOS sea-of-gates array chips.
6D.2 Folding a Stack of Equal Width Components
Venkat Thanvantri, Sartaj Sahni, Univ. of Florida, Gainesville, FL
We consider two versions of the problem of folding a stack of equal width
components. In both versions, when a stack is folded, a routing penalty is
incurred at the fold. In one version, the height of the folded layout is given
and we are to minimize width. In the other, the width of the folded layout is
given and its height is to be minimized.
6D.3 Area Minimization for Hierarchical Floorplans
Peichen Pan, Univ. of Ilinois, Urbana, IL, Weiping Shi, Univ. of North Texas, Denton, TX, C.L. Liu, Univ. of Illinois, Urbana, IL
Two results are presented in this paper. First we settle the open problem on
the complexity of the area minimization problem for hierarchical floorplans by
showing it to be NP-complete. We then present a pseudo-polynomial area
minimization algorithm for hierarchical floorplans of order-5. The algorithm is
based on a new algorithm for determining the set of nonredundant realizations of
a wheel. The new algorithm for wheels has time cost O(k^2 log k) and space cost
O(k^2 ) if each of the (five) blocks in a wheel has at most k realizations - a
reduction by a factor of k in both costs in comparison with previous algorithms.
The area minimization algorithm was implemented. Our experimental results show
that the algorithm is indeed very fast.
Moderators: David Dill, Stanford Univ., Stanford, CA
Olivier Coudert, Digital Equipment Corp., Rueil-Malmaison, France
7A.1 Multi-Level Synthesis for Safe Replaceability
Carl Pixley, Motorola, Inc., Austin. TX, Vigyan Singhai, Adnan Aziz, Robert K. Brayton, Univ. of California, Berkeley, CA
We describe the condition that a sequential digital design is a safe replacement
for an existing design without making any assumptions about a known initial
state of the design or about its environment. We formulate a safe replacement
condition which guarantees that if an original design is replaced by a new
design, the interacting environment cannot detect the change by observing the
input-output behavior of the new design; conversely, if a replacement design
does not satisfy our condition an environment can potentially detect the
replacement (in this sense the replacement is potentially unsafe). Our condition
allows simplification of the state transition diagram of an original design. We
use the safe replacement condition to derive a sequential resynthesis method for
area reduction of gate-level designs. We have implemented our resynthesis
algorithm and we report experimental results.
7A.2 Iterative Algorithms for Formal Verification of Embedded Real-Time Systems
Felice Balarin, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA
Most embedded real-time systems consists of many concurrent components operating
at significantly different speeds. Thus, an algorithm for formal verification of
such systems must efficiently deal with a large number of states and large
ratios of timing constants. We present such an algorithm based on timed
automata, a model where a finite state system is augmented with time measuring
devices called timers. We also present a semi-decision procedure for an extended
model where timers can be decremented. This extension allows describing
behaviors that are not expressible by timed automata, for example interrupts in
a real-time operating system.
7A.3 Incremental Formal Design Verification
Gitanjali M. Swamy, Robert K. Brayton, Univ. of California, Berkeley, CA
Language containment is a method for design verification that involves checking
if the behavior of the system to be verified is a subset of the behavior of the
specifications (properties or requirements), which it has to meet. If this check
fails, language containment returns a subset of `fair' states involved in
behavior that the system exhibits but the specification does not. Current
techniques for language containment do not take advantage of the fact that the
process of design is incremental; namely that the designer repeatedly modifies
and re-verifies his/her design. This results in unnecessary and cumbersome
computation. We present a method, which successively modifies the latest result
of verification each time the design is modified. Our incremental algorithm
translates changes made by the designer to an addition or subtraction of edges,
states or constraints (on acceptable behavior) from the transition behavior or
specification of the problem. Next, these changes are used to update the set of
`fair' states previously computed. This incremental algorithm is superior to the
current techniques for language containment; a conclusion supported by the
experimental results presented in this paper.
Moderators: Patrick McGeer, Cadence Berkeley Labs., Berkeley, CA
Hidetoshi Onodera, Kyoto Univ., Kyoto, Japan
7B.1 Optimization of Critical Paths in Circuits with Level-Sensitive Latches
Timothy M. Burks, IBM Corp., Austin, TX, Karem Sakallah, Univ. of Michigan, Ann Arbor, MI
A simple extension of the critical path method is presented which allows more
accurate optimization of circuits with level-sensitive latches. The extended
formulation provides a sufficient set of constraints to ensure that, when all
slacks are non-negative, the corresponding circuit will be free of late signal
timing problems. Cycle stealing is directly permitted by the formulation.
However, moderate restrictions may be necessary to ensure that the timing
constraint graph is acyclic. Forcing the constraint graph to be acyclic allows
a broad range of existing optimization algorithms to be easily extended to
better optimize circuits with level-sensitive latches. We describe the extension
of two such algorithms, both of which attempt to solve the problem of selecting
parts from a library to minimize area subject to a cycle time constraint.
7B.2 Computing the Entire Active Area/Power Consumption Versus Delay Trade-Off Curve for Gate Sizing with a Piecewise Linear Simulator
Michel Berkelaar, IBM Corp., Yorktown Heights, NY, Pim H.W. Buurman, Jochen A.G. Jess, Eindhoven Univ. of Tech., Eindhoven, The Netherlands
The gate sizing problem is the problem of finding load drive capabilities for
all gates in a given Boolean network such, that a given delay limit is kept, and
the necessary cost in terms of active area usage and/or power consumption is
minimal. This paper describes a way to obtain the entire cost versus delay
trade-–off curve of a combinational logic circuit in an efficient way. Every
point on the resulting curve is the global optimum of the corresponding gate
sizing problem. The problem is solved by mapping it onto piecewise linear
models in such a way, that a piecewise linear (circuit) simulator can do the
job. It is shown that this setup is very efficient, and can produce trade-–off
curves for large circuits (thousands of gates) in a few minutes. Benchmark
results for the entire set of MCNC '91 two-–level examples are given.
7B.3 Dynamical Identification of Critical Paths for Iterative Gate Sizing
How-Rern Lin, Ting-Ting Hwang, Tsing Hua Univ., Hsin-chu, Taiwan, ROC
Since only sensitizable paths contribute to the delay of a circuit, false paths
must be excluded in optimizing the delay of the circuit. Just identifying false
paths in the first place is not sufficient since during iterative optimization
process, false paths may become sensitizable, and sensitizable paths false. In
this paper, we examine cases for false path becoming sensitizable and
sensitizable becoming false. Based on these conditions, we adopt a so-called
loose sensitization criterion [ChD91] which is used to develop an algorithm for
dynamically identification of sensitizable paths. By combining gate sizing and
dynamically identification of sensitizable paths, an efficient performance
optimization tool is developed. Results on a set of circuits from ISCAS
benchmark set demonstrate that our tool is indeed very effective in reducing
circuit delay with less number of gate sized as compared with other methods.
Moderators: Mani Soma, Univ. of Washington, Seattle, WA
Abhijit Chatterjee, Georgia Inst. of Tech., Atlanta, GA
7C.1 Built-In Self-Test and Fault Diagnosis of Fully Differential Analogue Circuits
Salvador Mir, Vladimir Kolarik, Marcelo Lubaszewski, Christian Nielsen, Bernard Courtois, TIMA/INPG, Grenoble, France
An approach to the test and diagnosis of fully differential analogue circuits
is described in this paper. The test approach is based on off-line monitoring
via an analogue BIST observer the inputs of the operational amplifiers in the
circuit. The analogue BIST can detect both hard and soft faults. Diagnosis
resolution is improved by also monitoring the outputs of the operational
amplifiers. Faulty components can then be located and the actual defective value
of a faulty passive component determined.
7C.2 A New Built-In-Self-Test Approach for Digital-To-Analog and Analog-To-Digital Converters
Karim Arabi, Bozena Kaminska, Janusz Rseszut, École Polytechnique de Montréal, Montréal, Quebec, Canada
This paper proposes a test approach and circuitry suitable for built-in
self-test (BIST) of digital-to-analog (D/A) and analog-to-digital (A/D)
converters. Offset, gain, linearity and differential linearity errors are
tested without using test equipment. The proposed BIST structure decreases the
test cost and test time. The BIST circuitry has been designed to D/A and A/D
converters using CMOS 1.2 µm technology. By only a minor modification the test
structure would be able to localize the fail situation. The small value of area
overhead (AOH), the simplicity and efficiency of the proposed BIST architecture
seem to be promising for manufacturing.
7C.3 Fault Detection and Input Stimulus Determination for the Testing of Analog Integrated Circuits Based on Power-Supply Current Monitoring
Georges Gielen, Zhihua Wang, W. Sansen, Katholieke Univ. Leuven, Heverlee, Belgium
A new method for the testing and fault detection of analog integrated circuits
is presented. Time-domain testing followed by spectral analysis of the
power-supply current is used to detect both DC and AC faults. Spectral analysis
is applied since the tolerances on the circuit parameters make a direct
comparison of waveforms impossible. For the fault detection a probabilistic
decision rule is proposed based on a multivariate statistical analysis. Since
no extra testing pin is needed and the on-line calculation effort is small, the
method can be used for wafer-probe testing as well as final production testing.
In addition, a methodology for the selection of the input stimulus is presented
that improves the testability. Examples demonstrate the efficiency and the
effectiveness of the algorithms.
Moderators: Jay Brockman, Univ. of Notre Dame, Notre Dame, IN
Michaela Guiney, Hewlett Packard, Palo Alto, CA
7D.1 An Enhanced Flow Model for Constraint Handling in Hierarchical Multi-View Design Environments
Pieter van der Wolf, Olav ten Bosch, Alfred van der Hoeven, Delft Univ. of Tech., Delft, The Netherlands
In this paper we present an enhanced design flow model that increases the
capabilities of a CAD framework to support design activities on hierarchical
multi-view design descriptions. This flow model offers new constructs for the
configuration of complex design constraints in terms of conditions on the
hierarchical multi-view structure of a design. The design flow management system
enforces these constraints and uses them to inform the designer more effectively
about the validity of verification results and the executability of tools. This
helps to make the design process less error prone and to improve productivity.
Our solution is original in that we introduce the notions of design hierarchy
and equivalence in a design flow model. We thereby bridge a gap between the
areas of data management and design flow management. Strong points of our solution
are its simplicity and the seamless integration with existing flow management
concepts.
7D.2 On Modeling Top-Down VLSI Design
Bernd Schürmann, Joachim Altmeyer, Martin Schütze, Univ. of Kaiserslautern, Kaiserslautern, Germany
We present an improved data model that reflects the whole VLSI design process
including bottom-up and top-down design phases. The kernel of the model is a
static version concept that describes the convergence of a design. The design
history which makes the semantics of most other version concepts, is modeled
explicitly by additional object classes (entities types) but not by the version
graph itself. Top-down steps are modeled by splitting a design object into
requirements and realizations. The composition hierarchy is expressed by a
simple but powerful configuration method. Design data of iterative refinement
processes are managed efficiently by storing incremental data only.
7D.3 A Formal Basis for Design Process Planning and Management
Margarida F. Jacome, Univ. of Texas, Austin, TX, Stephen W. Director, Carnigie Mellon Univ., Pittsburgh, PA
In this paper we present a design formalism that allows for a complete and
general characterization of design disciplines and for a unified representation
of arbitrarily complex design processes. This formalism has been used as the
basis for the development of several prototype CAD meta-tools that offer
effective design process planning and management services.
Moderator: Hugo De Man, IMEC, Leuven, Belgium
8A.1 Design of Heterogeneous IC's for Mobile and Personal Communication Systems
Gert Goossens, Ivo Bolsens, Bill Lin, Francky Catthoor, IMEC, Leuven, Belgium
Mobile and personal communication systems form key market areas for
the electronics industry of the nineties. Stringent requirements in terms of
flexibility, performance and power dissipation, are driving the development of
integrated circuits into the direction of heterogeneous single-chip solutions.
New IC architectures are emerging which contain the core of a powerful
programmable processor, complemented with dedicated hardware, memory and
interface structures. In this tutorial we will discuss the real-life design of a
heterogeneous IC for an industrial telecom application : a reconfigurable mobile
terminal for satellite communication. Based on this practical design experience,
we will subsequently discuss a methodology for the design of heterogeneous ICs.
Design steps that will be addressed include : system specification and
refinement, data path and communication synthesis, and code generation for
embedded processor cores.
Moderators: Massoud Pedram, Univ. of Southern California, Los Angeles, CA
8B.1 Embedded Systems Design for Low Energy Consumption
Michael Schuette, John R. Barr, Motorola, Inc., Rolling Meadows, IL
This tutorial covers the circuit fundamentals of CMOS circuits which contribute
to the consumption of energy in portable products, as well as guidelines for the
design of systems in order to reduce energy consumption and prolong battery
life. Circuit fundamentals will include a definition of terms, basic circuit
elements, laws of operation, and basic circuit theory applying energy
consumption. We will then present three major principles of energy reduction:
reducing number of transitions, reducing the amount of switched capacitance, and
reducing the operating voltage. Several guidelines that can be applied during
the system design process which utilize the three major principles.
Moderators: Luciano Lavagno - Politecnico di Torino, Torino, Italy
David Dill - Stanford Univ., Stanford, CA
9A.1 Synthesis of Hazard-Free Multi-Level Logic Implementations Under Multiple-Input Changes From Binary Decision Diagrams
Bill Lin, IMEC Lab., Leuven, Belgium, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA
We describe a new method for directly synthesizing a hazard-free multilevel
logic implementation from a given logic specification. The method is based on
free/ordered Binary Decision Diagrams (BDD's), and is naturally applicable to
multiple-output logic functions. Given an incompletely-specified (multiple-output) Boolean function, the method produces a multilevel logic network that is
hazard-free for a specified set of multiple-input changes. We assume an
arbitrary (unbounded) gate and wire delay model under a pure delay (PD)
assumption, we permit multiple-input changes, and we consider both static and
dynamic hazards. This problem is generally regarded as a difficult problem and
it has important applications in the field of asynchronous design. The method
has been automated and applied to a number of examples. The results we have
obtained are very promising.
9A. 2 Performance-Driven Synthesis of Asynchronous Controllers
Kenneth Y. Yun, Univ. of California at San Diego, La Jolla, CA, David Dill, Stanford Univ., Stanford, CA
Bill Lin, IMEC Lab., Leuven, Belgium, Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA
We examine the implications of a new hazard-free combinational logic synthesis
method [8], which generates multiplexor trees from binary decision diagrams
(BDDs) - representations of logic functions factored recursively with respect to
input variables - on extended burst-mode asynchronous synthesis. First, the use
of the BDD-based synthesis reduces the constraints on state minimization and
assignment, which reduces the number of additional state variables required in
many cases. Second, in cases where conditional signals are sampled, it eliminates
the need for state variable changes preceding output changes, which reduces
overall input to output latency. Third, selection variables can easily be
ordered to minimize the latency on a user-specified path, which is important for
optimizing the performance of systems that use asynchronous components. We
present extensive evaluations showing that, with only minimal optimization, the
BDD-based synthesis gives comparable results in area with our previous exact
two-level synthesis method. We also give a detailed example of the specified
path optimization.
9A. 3 Decomposition Methods for Library Binding of Speed-Independent Asynchronous Designs
Polly Siegel, Giovanni De Micheli, Stanford Univ., Stanford, CA
We describe methods for decomposing gates within a speed-independent
asynchronous design. The decomposition step is an essential part of the library
binding process, and is used both to increase the granularity of the design for
higher quality mapping and to ensure that the design can be implemented. We
present algorithms for simple hazard-free gate decomposition, and show results
which indicate that we can decompose most of the gates in our benchmark set by
this simple method. We then extend these algorithms to work for those cases in
which no simple decomposition exists.
Moderators: Theodor Vierhaus, GMD, Sankt Augustin, Germany
Kwang-Ting Cheng, Univ. of California, Santa Barbara, CA
9B.1 On Error Correction in Macro-Based Circuits
Irith Pomeranz, Sudhakar Reddy, Univ. of Iowa, Iowa City, IA
We consider the problem of correcting errors in a macro-based circuit. Our
formulation of the problem allows the correction of errors that arise both in
the context of design error correction, before the circuit is realized, and in
the context where a physical circuit needs to be corrected. Two error classes
are defined, namely, component errors and line errors. Both single and multiple
errors are considered. Accurate correction procedures are given for single
errors. Heuristics are given for correcting multiple errors. Experimental
results are given to demonstrate the correction procedures presented.
9B.2 Fault Dictionary Compaction by the Elimination of Output Sequences
Vamsi Boppana, W. Kent Fuchs, Univ. of Illinois, Urbana, IL
Fault dictionary compaction has been accomplished in the past by removing
responses on individual output pins for specific test vectors. In contrast to
the previous work, we present techniques for eliminating entire sequences of
outputs and for efficiently storing the remaining output sequences. Experimental
results on the ISCAS 85 and ISCAS 89 benchmark circuits show that the sizes of
dictionaries proposed are substantially smaller than the full fault dictionary,
while the dictionaries retain most or all of the diagnostic capability of the
full fault dictionary.
9B.3 Automatic Test Program Generation for Pipelined Processors
Hiroaki Iwashita, Satoshi Kowatari, Tsuneo Nakata, Fumiyasu Hirose, Fujitsu Labs. Ltd., Kawasaki, Japan
Simulation-based verification has both advantages and disadvantages compared
with formal verification. Our demand is to find a practical way to verify
actual microprocessors. This paper presents an efficient test program
generation method for simulation-based verification using techniques developed
for formal verification. Our test program generator enumerates all reachable
states of a processor pipeline and generates instruction sequences for every
reachable test case. The program covers complicated test cases that are
difficult to cover with random instructions and impossible to cover with
conventional test program generation methods. Our test program generator also
works for larger microprocessor designs than formal verifiers have done.
Moderators: Peter Feldmann, AT&T Bell Labs., Murray Hill, NJ
Kurt J. Antreich, Technical Univ. of Munich, Munich, Germany
9C.1 Synthesis of Manufacturable Analog Circuits
Tamal Mukherjee, L. Richard Carley, Rob A. Rutenbar-Carnegie Mellon Univ., Pittsburg, PA
We describe a synthesis system that takes operating range constraints and
inter- and intra- circuit parametric manufacturing variations into account while
designing a sized and biased analog circuit. Previous approaches to CAD for
analog circuit synthesis have concentrated on nominal analog circuit design, and
subsequent optimization of these circuits for statistical fluctuations and
operating point ranges. Our approach simultaneously synthesizes and optimizes
for operating and manufacturing variations by mapping the circuit design problem
into an Infinite Programming problem and solving it using an annealing within
annealing formulation. We present circuits designed by this integrated synthesis
system, and show that they indeed meet their operating range and parametric
manufacturing constraints.
9C.2 A Statistical Optimization-Based Approach for Automated Sizing of Analog Cells
F. Medeiro, F.V. Fernández, R. Domínguez-Castro, A. Rodríguez-Vázguez, Centro Nacional de Microelectrónica, Sevilla, Spain
This paper presents a CAD tool for automated sizing of analog cells using
statistical optimization in a simulation based approach. A nonlinear
penalty-like approach is proposed to define a cost function from the performance
specifications. Also, a group of heuristics is proposed to increase the
probability of reaching the global minimum as well as to reduce CPU time during
the optimization process. The proposed tool sizes complex analog cells starting
from scratch, within reasonable CPU times (approximately 1hour for a fully
differential opamp with 51 transistors), requiring no designer interaction, and
using accurate transistor models to support the design choices. Tool operation
and feasibility is demonstrated via experimental measurements from a working
CMOS prototype of a folded-cascode amplifier.
9C.3 Time-Domain Non-Monte Carlo Noise Simulation for Nonlinear Dynamic Circuits with Arbitrary Excitations
Alper Demir, Edware Liu, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA
A new, time-domain, non-Monte Carlo method for computer simulation of electrical
noise in nonlinear dynamic circuits with arbitrary excitations is presented.
This time-domain noise simulation method is based on the results from the theory
of stochastic differential equations. The noise simulation method is general in
the sense that any nonlinear dynamic circuit with any kind of excitation, which
can be simulated by the transient analysis routine in a circuit simulator, can
be simulated by our noise simulator in time-domain to produce the noise
variances and covariances of circuit variables as a function of time, provided
that noise models for the devices in the circuit are available. Noise
correlations between circuit variables at different time points can also be
calculated. Previous work on computer simulation of noise in integrated circuits
is reviewed with comparisons to our method. Shot, thermal and flicker noise
models for integrated-circuit devices, in the context of our time-domain noise
simulation method, are described. The implementation of this noise simulation
method in a circuit simulator (SPICE) is described. Two examples of noise
simulation (a CMOS ring-oscillator and a BJT active mixer) are given.
Moderators: Majid Sarrafzadeh, Northwestern Univ., Evanston, IL
Yoji Kajitani, JAIST, Ishikawa, Japan
9D.1 Improving Over-The-Cell Channel Routing in Standard Cell Design
Xiaolin Liu, Ioannis G. Tollis, Univ. of Texas, Richardson, TX
The first stage of over-the-cell routing in the horizontally connected
vertically connected (HCVC) model is formulated as follows: Given two rows of
terminals, find a planar routing to connect a subset of nets (with weights) on
each row of terminals using a fixed number of tracks to maximize the total
weight. This problem is called the two row fixed height planar routing (TFPR)
problem [CPL93]. The complexity of the TFPR problem was unknown up to now. An
approximation algorithm for the TFPR problem was presented in [CPL93]. In this
paper we present a O(n^2 * h^2) time algorithm to solve the TFPR problem
optimally, where n is the number of terminals and h is the height of the
standard cells. Our algorithm can be used to improve the performance of several
over-the-cell channel routers including the ones in [CPL93] and [HSS93].
9D.2 Minimum Crosstalk Switchbox Routing
Tong Gao, C.L. Liu, Univ. of Illinois, Urbana, IL
As technology advances, interconnection wires are placed in closer proximity.
Consequently, reduction of crosstalks between interconnection wires becomes an
important consideration in VLSI design. In this paper, we study the gridded
switchbox routing problems with the objectives of satisfying crosstalk
constraints and minimizing the total crosstalk in the nets. We propose a new
approach to the problems which utilizes existing switchbox routing algorithms
and improves upon the routing results by re-assigning the horizontal and
vertical wire segments to rows and columns, respectively, in an iterative
fashion. This approach can also be applied to the channel routing problem with
crosstalk constraints. A novel mixed ILP formulation and effective procedures
for reducing the number of variables and constraints in the mixed ILP
formulation are then presented. The experimental results are encouraging.
9D.3 Techniques for Crosstalk Avoidance in the Physical Design of High-Performance Digital Systems
Desmond Kirkpatrick, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA
Interconnect performance does not scale well into deep submicron dimensions, and
the rising number of analog effects erodes the digital abstraction necessary for
high levels of integration. In particular, crosstalk is an analog phenomenon of
increasing relevance. To cope with the increasingly analog nature of
high-performance digital system design, we propose using a constraint-driven
methodology. In this paper we describe new constraint generation ideas
incorporating digital sensitivity. In constraint-driven synthesis, we show that
a fundamental subproblem of crosstalk channel routing, coupling-constrained
graph levelization(CCL), is NP-complete, and develop a novel heuristic algorithm.
To demonstrate the viability of our methodology, we introduce a gridless
crosstalk-avoiding channel router as an example of a robust and truly
constraint-driven synthesis tool.
Moderators: Herve Touati, Digital Equipment Corp., Rueil-Malmaison, France
Albert Wang, Synopsys, Inc., Mountain View, CA
10A.1 Efficient Breadth-First Manipulation of Binary Decision Diagrams
Pranav Ashar, Matthew Cheong, NEC USA, Princeton, NJ
We propose new techniques for efficient breadth-first iterative manipulation of
ROBDDs. Breadth-first iterative ROBDD manipulation can potentially reduce the
total elapsed time by multiple orders of magnitude compared to the conventional
depth-first recursive algorithms when the memory requirement exceeds the
available physical memory. However, the breadth-first manipulation algorithms
proposed so far [5] have had a large enough overhead associated with them to
make them impractical. Our techniques are geared towards minimizing the overhead
without sacrificing the speed up potential. Experimental results indicate
considerable success in that regard.
10A.2 Symmetry Detection and Dynamic Variable Ordering of Decision Diagrams
Shipra Panda, Fabio Somenzi, Univ. of Colorado, Boulder, CO, Bernard F. Plessier, Motorola, Inc., Austin. TX
Knowing that some variables are symmetric in a function has numerous
applications; in particular, it can help produce better variable orders for
Binary Decision Diagrams (BDDs) and related data structures (e.g., Algebraic
Decision Diagrams). It has been conjectured that there always exists an optimum
order for a BDD wherein symmetric variables are contiguous. We propose a new
algorithm for the detection of symmetries, based on dynamic reordering, and we
study its interaction with the reordering algorithm itself. We show that
combining sifting with an efficient symmetry check for contiguous variables
results in the fastest symmetry detection algorithm reported to date and
produces better variable orders for many BDDs. The overhead on the sifting
algorithm is negligible.
10A.3 A Redesign Technique for Combinational Circuits Based on Reconnections of Gates
Yuji Kukimoto, Univ. of California, Berkeley, CA, Masahiro Fujita, Fujitsu Laboratories of America, San Jose, CA
Robert K. Brayton, Univ. of California, Berkeley, CA
In this paper, we consider a redesign technique applicable to combinational
circuits implemented with gate-array or standard-cell technology, where we
rectify an existing circuit only by reconnecting gates on the circuit with all
the gate types unchanged. This constraint allows us to reuse the original
placement as is, thereby speeding up the total time needed for a redesign. We
formulate this problem as a Boolean-constraint problem and give a BDD-based
algorithm to check the feasibility of redesign.
Moderators: Sudhir Kadkade, Mentor Graphics Corp., Wilsonville, OR
Yervant Zorian, AT&T Bell Labs., Princeton, NJ
10B.1 Non-Scan Design-For-Testability of RT-Level Data Paths
Sujit Dey, Miodrag Potkonjak, C&C Research Labs., NEC USA, INC., Princeton, NJ
This paper presents a non-scan design-for-testability technique applicable
to register-transfer(RT) level data path circuits, which are usually very
hard-to-test due to the presence of complex loop structures. We develop a new
testability measure, and utilize the RT-level structure of the data path, for
cost-effective re-design of the circuit to make it easily testable, without
having to either scan any flip-flop or break loops directly. The non-scan DFT
technique was applied to several data path circuits. Experimental results
demonstrate the feasibility of producing non-scan testable data paths, which can
be tested at-speed. The hardware overhead and the test application time required
for the non-scan designs is significantly lower than the corresponding partial
scan designs.
10B.2 Selecting Partial Scan Flip-Flops for Circuit Partitioning
Toshinobu Ono, NEC Corp., Kawasaki, Japan
This paper presents a new method of selecting scan flip-flops (FFs) in partial
scan designs of sequential circuits. Scan FFs are chosen so that the whole
circuit can be partitioned into many small subcircuits which can be dealt with
separately by a test pattern generator. This permits easy automatic test pattern
generation for arbitrarily large sequential circuits. Algorithms of selecting
scan FFs to allow such partitioning and of scheduling tests for subcircuits are
given. Experimental results show that the proposed method makes it possible to
generate test patterns for extra large sequential circuits which previous
approaches cannot deal with.
10B.3 Logic Synthesis Techniques for Reduced Area Implementation of Multilevel Circuits with Concurrent Error Detection
Nur A. Touba, Edward J. McCluskey, Stanford Univ., Stanford, CA
This paper presents new logic synthesis techniques for generating multilevel
circuits with concurrent error detection based on a parity-check code scheme
that can detect all errors caused by single stuck-at faults. These synthesis
techniques fully automate the design process and allow for a better quality
result than previous methods thereby reducing the cost of concurrent error
detection. An algorithm is described for selecting a good parity-check code for
encoding the outputs of a circuit. Once the code has been chosen, a new
procedure called structure-constrained logic optimization is used to minimize
the area of the circuit as much as possible while still using a circuit
structure that ensures that single stuck-at faults cannot produce undetected
errors. The implementation that is generated is path fault secure and when
augmented by a checker forms a self-checking circuit. Results indicate that
self-checking multilevel circuits can be generated which require significantly
less area than using duplication.
Moderators: Rob A. Rutenbar, Carnegie Mellon Univ., Pittsburg, PA
Seijiro Moriyama, Toshiba Corp., Yokohama, Japan
10C.1 Macromodeling of Analog Circuits for Hierarchical Circuit Design
Jianfeng Shao, Ramesh Harjani, Univ. of Minnesota, Minneapolis, MN
Hierarchy plays a significant role in the design of digital and analog circuits.
At each level of the hierarchy it becomes essential to evaluate if a sub-block
design is feasible and if so which design style is the best candidate for the
particular problem. This paper proposes a general methodology for evaluating the
feasibility and the performance of sub-blocks at all levels of the hierarchy. A
modified simplicial approximation technique is used to generate the feasibility
macromodel and a layered volume-slicing methodology with radial basis functions
is used to generate the performance macro-model. However, due to lack of space,
only details of the performance macromodeling techniques are included.
Macromodels are developed and verified for analog blocks at three different
levels of hierarchy (current mirror, opamp, and A/D converter).
10C.2 Approximate Symbolic Analysis of Large Analog Integrated Circuits
Qicheng Yu, Carl Sechen, Univ. of Washington, Seattle, WA
This paper describes a unified approach to the approximate symbolic analysis of
large linearized analog circuits. It combines two new approximation-during-computation strategies with a variation of the classical two-graph tree
enumeration method. The first strategy is to generate common trees of the
two-graphs, and therefore the product terms in the symbolic network function, in
the decreasing order of magnitude. The second approximation strategy is the
sensitivity-based simplification of two-graphs, which excludes from the
two-graphs many circuit elements that have little effect on the network function
being derived. Our approach is therefore able to symbolically analyze much
larger analog integrated circuits than previous reported, using complete small
signal models for the semiconductor devices. We show accurate yet reasonably
sized symbolic network functions for integrated circuits with up to 39
transistors whereas previous approaches were limited to less than 15.
10C.3 Testing of Analog Systems Using Behavioral Models and Optimal Experimental Design Techniques
Eric Felt, Alberto Sangiovanni-Vincentelli, Univ. of California, Berkeley, CA
This paper describes a new CAD algorithm which performs automatic test pattern
generation (ATPG) for a general class of analog systems, namely those circuits
which can be efficiently modeled as an additive combination of user-defined
basis functions. The algorithm is based on the statistical technique of
I-optimal experimental design, in which test vectors are chosen to be maximally
independent so that circuit performance will be characterized as accurately as
possible in the presence of measurement noise and model inaccuracies. This
technique allows analog systems to be characterized more accurately and more
efficiently, thereby significantly reducing system test time and hence total
manufacturing cost.
Moderators: Jason Cong, Univ. of California, Los Angeles, CA
Takashi Kambe, SHARP Co., Nara, Japan
10D.1 Layer Assignment for High-Performance Multi-Chip Modules
Kai-Yuan Chao, D.F. Wong, Univ. of Texas, Austin, TX
In this paper, we present a layer assignment method for high-performance
multi-chip module environments. In contrast with treating global routing and
layer assignment separately, our method assigns nets to layers while considering
preferable global routing topologies simultaneously. We take transmission line
effects into account to avoid noise in high-speed circuit packages. The problem
is formulated as a quadratic Boolean programming problem and an algorithm is
presented to solve the problem after linearization. Our method is applied to a
set of benchmark circuits to demonstrate the effectiveness.
10D.2 The Reproducing Placement Problem with Applications
Wei-Liang Lin, Majid Sarrafzadeh, Northwestern Univ., Evanston, IL, C.K. Wong,
We study a new placement problem: the reproducing placement problem (RPP). In
each phase a module (or gate) is decomposed into two (or more) simpler modules.
The goal is find a "good" placement in each phase. The problem, being iterative
in nature, requires an iterative algorithm. The problem finds applications in
several gate-level placement problems, e.g., in layout-driven logic synthesis.
We introduce the notion of minimum floating Steiner trees (MFST). We employ an
MFST algorithm as a central step in solving the RPP. A Hanan-like theorem is
established for the MFST problem and two approximation algorithms are proposed.
Experiments on commonly employed benchmarks verify the effectiveness of the
proposed technique.
10D.3 RISA: Accurate and Efficient Placement Routability Modeling
Chih-liang Eric Cheng, Cadence Design Systems, Inc., San Jose, CA
The prevalence of net list synthesis tools raises great concern on routability
of cell placement created with state-of-the-art placement techniques. In this
paper, an accurate and efficient placement routability modeling technique is
proposed and incorporated into the prevailing simulated annealing approach. This
accurate and efficient modeling is based on the supply versus demand analysis of
routing resource over an array of regions on a chip. Vertical and horizontal
routability is analyzed separately due to the bias of routing resource in
multiple-metal-layer ASIC designs. A special technique on net bounding box
partitioning is also proposed and critical to the accuracy of this modeling at
the presence of mega cells, which tend to cause local routing congestion. By
incorporating this efficient modeling into the cost function of simulated
annealing, experiments conducted on small to large industrial designs indicate
that placement routability evaluated with a global router is greatly improved
as a result of the proposed accurate modeling.
Moderators: Fabio Somenzi, Univ. of Colorado, Boulder, CO
Srinivas Devadas, Massachusetts Inst. of Tech., Cambridge, MA
11A.1 A New Approach for Factorizing FSM's
C. Rama Mohan, Cadence Design Systems Pvt. Ltd., Noida, India, P.P. Chakrabarti, Indian Inst. of Tech., Kharagpur, India
Exact factors as defined in [2], if present in an FSM can result in most
effective way of factorization. However, it has been found that most of the
FSM's are not exact factorizable. In this paper, we have suggested a method
of making FSM's exact factorizable by minor changes in the next state space
while maintaining the functionality of the FSM. We have also developed a new
combined state assignment algorithm for state encoding of Factored and Factoring
FSM's. Experimental results on MCNC benchmark examples, after running MISII
on the Original FSM, Factored FSM and Factoring FSM have shown a reduction of
40% in the worst case signal delay through the circuit in a multilevel
implementation. The total number of literals, on an average is the same after
factorization as that abtained by running MISII on the original FMS. For
two-level implementation, our method has been able to factorize Benchmark
FSM's with a 14% average increase in overall areas, while the areas of
combinational components of Factored and Factoring FSM's have been found to
be significantly less than the area of the combinational component of the
original FSM.
11A.2 Boolean Constrained Encoding: A New Formulation and a Case Study
N.L.V. Calazans, PUCRS, Porto Alegre, RS, Brazil
This paper provides a new, generalized approach to the problem of encoding
information as vectors of binary digits. We furnish a formal definition for
the Boolean constrained encoding problem, and show that this definition
encompasses many particular encoding problems found in VLSI design, at various
description abstraction levels. Our approach can capture equivalence and/or
compatibility classes in the original symbol set to encode, by allowing symbols
codes to be cubes of a Boolean space, instead of the usual minterms. Besides,
we introduce a unified framework to represent encoding constraints which is more
general than previous efforts. The framework is based upon a new definition of
the pseudo-dichotomy concept, and is adequate to guide the solution of encoding
problems through the satisfaction of constraints extracted from the original
problem statement. An encoding problem case study is presented, the state
assignment of synchronous finite state machines with the simultaneous
consideration of state minimization. The practical comparison with
well-established approaches to solve this problem in two separate steps, shows
that our solution is competitive with other published results. However, the
case study is primarily intended to show that feasibility of the Boolean
constrained encoding problem formulation.
11A.3 Optimizing of Hierarchical Designs Using Partitioning and Resynthesis
Heinz-Josef Eikerling, Ralf Hunstock, Univ. of Raderborn, Paderborn, Germany
Raul Camposano, Synopsys, Inc., Mountain View, CA
This paper explores the influence of optimization along the boundary between
hierarchically described components. A novel technique called repartitioning
combines partitioning and sequential resynthesis of the design under various
quality measures. It is applied to various digital circuits which consist of a
controller and a datapath. The outcome of this effort is a versatile,
parametrizable resynthesis tool which preserves this hierarchy. Due to the cost
measures, an average improvement ranging between 5% and 15% was obtained.
Moderators: Prab Varma, CrossCheck Technology, Inc., San Jose, CA
Tom Niermann, Sunrise Test Systems, Sunnyvale, CA
11B.1 HyHOPE: A Fast Fault Simulator with Efficient Simulation of Hypertrophic Faults
Chen-Pin Kung, Chen-Shang Lin, National Taiwan Univ., Taiwan, ROC
In sequential circuit fault simulation, the hypertrophic faults, which result
from lengthened initialization sequence in the faulty circuits, usually produce
a large number of fault events during simulation and require excessive gate
evaluations. These faults degrade the performance of fault simulators attempting
to simulate them exactly. In this paper, an exact simulation algorithm is
developed to identify the hypertropic faults and to minimize their effects
during the fault simulation. The simulator HyHOPE based on this algorithm shows
that the average speedup ratio over HOPE 1.1 is 1.57 for ISCAS89 benchmark
circuits. Furthermore, the result indicates the performance of HyHOPE is close
to the approximate simulator in which faults are simply dropped when they
become potentially detected.
11B.2 Fast Timing Simulation of Transient Faults in Digital Circuits
A. Dharchoudhury, Sung Mo Kang, Hungse Cha, Janak H. Patel, Univ. of Illinois, Urbana, IL
Transient fault simulation is an important verification activity for circuits
used in critical applications since such faults account for over 80% of all
system failures. This paper presents a timing level transient fault simulator
that bridges the gap between electrical and gate-level transient fault
simulators. A generic MOS circuit primitive and analytical solutions of node
differential equations are used to perform transistor level simulation with
accurate MOS-FET models. The transient fault is modeled by a piece-wise
quadratic injected current waveform; this retains the electrical nature of the
transient fault and provides SPICE-like accuracy. Detailed comparisons with
SPICE3 show the accuracy of this technique and speedups of two orders of
magnitude are observed for circuits containing up to 2000 transistors. Latched
error distributions of the benchmark circuits are also provided.
11B.3 A Fast and Memory-Efficient Diagnostic Fault Simulation for Sequential Circuits
Jer Min Jou, Shung-Chih Chen, National Cheng Kung Univ., Tainan, Taiwan, ROC
In this paper, a fast and memory-efficient diagnostic fault simulator for
sequential circuits is proposed. In it, a two-level optimization technique is
developed and used to prompt the processing speed. In the first high level, an
efficient list, which stores the indistinguishable faults so far for each fault
during simulation, and the list maintaining algorithm are applied, thus the
number of diagnostic comparisons in minimized. In the second low level, a
bit-parallel comparison is developed to speed up the comparing process.
Therefore, the different diagnostic measure reports for a given test set can be
generated very quickly. In addition, the simulator is extended to diagnose the
single stuck-at device fault. Experimental results slow that this diagnostic
simulator achieves a significant speedup compared to previous methods.
Moderators: Tom Szymanski, AT&T Bell Labs., Murray Hill, NJ
Sharad Malik, Princeton Univ., Princeton, NJ
11C.1 Timing Uncertainty Analysis for Time-of-Flight Systems
John R. Feehrer, Harry F. Jordan, Univ. of Colorado, Boulder, CO
Time-of-flight synchronization is a new digital design methodology that
eliminates all latching devices, allowing higher clock rates than alternative
timing schemes. Synchronization is accomplished by precisely balancing
connection delays. Many effective pipeline stages are created by pipelining
combinational logic, similar in concept to wave pipelining but differing in
several respects. Due to the unique flow-through nature of circuits and to the
need for pulse-mode operation, time-of-flight design exposes interesting new are
as for CAD timing analysis. This paper discusses areas for CAD timing analysis.
This paper discusses how static propagation delay uncertainty limits the clock
period for time-of-flight circuits built with opto-electronic devices. We
present algorithms for placing a minimum set of clock gates to restore timing in
feedback loops that implement memory and for propagating delay uncertainty
through a circuit graph. A mixed integer program determining the minimum
feasible clock period subject to pulse width and arrival time constraints is
discussed. Algorithms are implemented in XHatch, a time-of-flight CAD package.
11C.2 Provably Correct High-Level Timing Analysis Without Path Sensitization
S. Bhattacharya, Duke Univ., Durham, NC, Sujit Dey, C&C Research Labs., NEC USA, Inc., Princeton, NJ
Franc Brglez, North Carolina State Univ., Raleigh, NC
This paper addresses the problem of true delay estimation during high level
design. The existing delay estimation techniques either estimate the topological
delay of the circuit which may be pessimistic, or use gate-level timing analysis
for calculating the true delay, which may be prohibitively expensive.
We show that the paths in the implementation of a behavioral specification can
be partitioned into two sets, SP and UP. While the paths in SP can affect the
delay of the circuit, the paths in UP cannot. Consequently, the true delay of
the resulting circuit can be computed by just measuring the topological delay
of the paths in SP , eliminating the need for the computationally intensive
process of path sensitization. Experimental results show that high-level true
delay estimation can be done very fast, even when gate-level true delay
estimation becomes computationally infeasible. The high-level delay estimates
are verified by comparing with delay estimates obtained by gate-level timing
analysis on the actual implementation.
11C.3 A Timing Analysis Algorithm for Circuits with Level-Sensitive Latches
Jin-fuw Lee, Donald T. Tang, C.K. Wong, IBM Corp., Yorktown Heights, NY
For a logic design with level-sensitive latches, we need to validate timing
signal paths which may flush through several latches. We developed efficient
algorithms based on the modified shortest and longest path method. The
computational complexity of our algorithm is generally better than that of known
algorithms in the literature. The implementation (CYCLOPSS) has been applied to
an industrial chip to verify the clock schedules.
Moderators: Margarida F. Jacome, Carnegie Mellon Univ., Pittsburgh, PA
Pieter van der Wolf, Delft Univ. of Tech., Delft, The Netherlands
11D.1 An Object-Oriented Cell Library Manager
Naresh K. Sehgal, Intel Corp., Santa Clara, CA, C.Y. Roger Chen, Syracuse Univ., Syracuse, NY, John M. Acken, Intel Corp., Santa Clara, CA
New techniques are proposed to obtain better estimates and optimizations at
higher levels of design abstractions, which are then used for library cell
selection. A single object-oriented database repository is used during all
phases of VLSI design to enhance the early design estimates. As compared to a
relational database using sorted tables of attribute values, the proposed
object-oriented cell library manager reduces search time for an appropriate cell
, with m constraints among n cells, from O(nm) to O(m log n). The proposed
method also reduces design cycle time by reducing the number of iterations due
to mismatched performance estimates done in the earlier phases of a design.
11D.2 Reuse of Design Objects in CAD Frameworks
Joachim Altmeyer, Stefan Ohnsorge, Bernd Schürmann, Univ. of Keiserslautern, Kaiserslautern, Germany
The reuse of well-tested and optimized design objects is an important aspect for
decreasing design times, increasing design quality, and improving the
predictability of designs. Reuse spans from the selecting cells from a library
up to adapting already designed objects. In this paper, we present a new model
for reusing design objects in CAD frameworks. Based on experiences in other
disciplines, mainly in software engineering and case-based reasoning, we
developed a feature-based model to describe design objects and their
similarities. Our model considers generic modules as well as multi-functional
units. We discuss the relationships of the model to the design process and to
the configuration hierarchy of complex design objects. We examined our model
with the prototype system RODEO.
11D.3 Towards Support for Design Description Languages in EDA Frameworks
Olav Schetler, Susanne Heymann, GMD, Sankt Augustin, Germany
We report on a new framework service for design tool encapsulation, based on an
information model for design management. The new service uses generated language
processors that perform import and export of design files to and from a design
management database with the support of nested syntax specifications and
extension language scripts. Our prototype design environment is based on the
Nelsis CAD Framework and several tools from the Synopsys high-level synthesis
and simulation tool suite.
|