|
ICCAD 2001 Abstracts
Sessions:
[1A]
[1B]
[1C]
[1D]
[2A]
[2B]
[3A]
[3B]
[3C]
[3D]
[Panel]
[4A]
[4B]
[4C]
[4D]
[5A]
[5B]
[6A]
[6B]
[6C]
[6D]
[7A]
[7B]
[7C]
[7D]
[8A]
[8B]
[8C]
[8D]
[9A]
[9B]
[9C]
[9D]
[10A]
[10B]
[10C]
[11A]
Moderators: Jay Lawrence, Cadence Design Systems, Inc., Chelmsford, MA
Kunle Olokotun, Stanford University, Stanford, CA
-
1A.1 Static Scheduling of Multi-Domain Memories For Functional Verification [p. 2]
-
Murali Kudlugi, Charles Selvidge, Russell Tessier
Over the past decade both the quantity and complexity of available
on-chip memory resources have increased dramatically. In order to
ensure accurate ASIC behavior, both logic functions and memory
resources must be successfully verified before fabrication. Often,
the functional verification of contemporary ASIC memory is complicated
by the presence of multiple design clocks that operate
asynchronously to each other. The presence of multiple clock
domains presents significant challenges for large parallel verification
systems such as parallel simulators and logic emulators that
model both design logic and memory. Specifically, multiple asynchronous
design clocks make it difficult to verify that design hold
times are met during memory model execution and causality along
memory data/control paths is preserved during signal communication.
In this paper, we describe new scheduling heuristics for memory-based
designs with multiple asynchronous clock domains that
are mapped to parallel verification systems. Our scheduling
approach scales to an unlimited number of clock domains and converges
quickly to a feasible solution if one exists. It is shown that
when our technique is applied to an FPGA-based emulator containing
48MB of SRAM, evaluation fidelity is maintained and
increased verification performance is achieved for large, memory-intensive
circuits with multiple asynchronous clock domains.
-
1A.2 A Simulation-Based Method for the Verification of Shared Memory in Multiprocessor
Systems [p. 10]
-
Scott A. Taylor, Carl Ramey, Craig Barner, David Asher
As processor architectural complexity increases, greater effort must be
focused on functional verification of the chip as a component of the system.
Multiprocessor verification presents a particular challenge in terms of both
difficulty and importance. While formal methods have made significant progress
in the validation of coherence protocols, these methods are not always
practical to apply to the structural implementation of a complex microprocessor.
This paper describes a simulation-based approach to modeling and checking the
simulation-based approach to modeling and checking the shared-memory properties
of the Alpha architecture by using a directed acyclic graph to represent
memory-access orderings. The resulting tool is integrated with a simulation
model of an Alpha implementation, allowing the user to verify aspects of the
implementation with respect to the overall architectural specification. Both
an implementation-dependent and an implementation-specific version of the tool
are discusses.
Keywords:
Microprocessor, Functional Verification, Shared Memory, Checking
-
1A.3 Predicting the Performance of Synchronous Discrete Event Simulation Systems [p. 18]
-
Jinsheng Xu, Moon Jung Chung
In this paper we propose a model to predict the performance of
synchronous discrete event simulation. The model considers
parameters including the number of active objects per cycle, event
execution granularity and communication cost. We derive a single
formula that predicts the performance of synchronous simulation.
We have benchmarked several VHDL circuits on SGI Origin 2000.
The benchmark results show that the prediction model explains more
than 90% of parallel simulation execution time. We also measure the
effect of computation granularity over performance. The benchmark
results show that although higher granularity can have better speedup
because of dominance of computation over communication, the
computational granularity cannot overshadow the inherent
synchronization cost. This model can be used to predict the speed-up
expected for synchronous simulation, and to decide whether it is
worthwhile to use synchronous simulation before actually
implementing it.
Keywords:
Parallel Discrete Event Simulation, Synchronous Simulation,
Synchronization Cost, Communication Cost, Granularity,
Performance.
Moderators: Yanbing Li, Synopsys, Inc., Mountain View, CA
Xiaobo (Sharon) Hu, University of Notre Dame, Notre Dame, IN
-
1B.1System-Level Exploration for Pareto-Optimal Configurations in Parameterized
Systems-on-a-Chip [p. 25]
-
Tony D. Givargis, Frank Vahid, Joerg Henkel
In this work, we provide a technique for efficiently exploring the
configuration space of a parameterized system-on-a-chip (SOC)
architecture to find all Pareto-optimal configurations. These
configurations represent the range of meaningful power and
performance tradeoffs that are obtainable by adjusting parameter
values for a fixed application mapped onto the SOC architecture. Our
approach extensively prunes the potentially large configuration space
by taking advantage of parameter dependencies. We have successfully
incorporated our technique into the parameterized SOC tuning
environment (Platune) and applied it to a number of applications.
Keywords:
System-on-a-chip, parameterized architectures, configurable
platforms, embedded systems, system-level exploration, low-power
system design, platform tuning.
-
1B.2 System Level Design with Spade: an M-JPEG Case Study [p. 31]
-
Paul Lieverse, Todor Stefanov, Pieter van der Wolf, Ed F. Deprettere
In this paper we present and evaluate the SPADE (System level
Performance Analysis and Design space Exploration) methodology
through an illustrative case study. SPADE is a method and
tool for architecture exploration of heterogeneous signal processing
systems. In this case study we start from an M-JPEG application
and use SPADE to evaluate alternative multiprocessor
architectures for implementing this application. SPADE follows
the Y-chart paradigm for system level design; application and architecture
are modeled separately and mapped onto each other
in an explicit design step. SPADE permits architectures to be
modeled at an abstract level using a library of generic building
blocks, thereby reducing the cost of model construction and simulation.
The case study shows that SPADE supports efficient exploration
of candidate architectures; models can be easily constructed,
modified and simulated in order to quickly evaluate alternative
system implementations.
Keywords:
system level design, design space exploration, application modeling,
architecture modeling, M-JPEG
-
1B.3 NetBench: A Benchmarking Suite for Network Processors [p. 39]
-
Gokhan Memik, William H. Mangione-Smith, Wendong Hu
In this study we introduce NetBench, a benchmarking
suite for network processors. NetBench contains a total of 9 applications
that are representative of commercial applications for network processors.
These applications are from all levels of packet processing; Small,
low-level code fragments as well as large application level programs are
included in the suite.
Using SimpleScalar simulator we study the NetBench programs in detail
and characterize the network processor workloads. We also compare
key characteristics such as instructions per cycle, instruction distribution,
branch prediction accuracy, and cache behavior with the programs
from MediaBench. Although the aimed architectures are similar for
MediaBench and NetBench suites, we show that these workloads have
significantly different characteristics. Hence a separate benchmarking
suite for network processors is a necessity. Finally, we present performance
measurements from Intel IXP1200 Network Processor to show
how NetBench can be utilized.
Moderators: Robi Dutta, Synopsys, Inc., Mountain View, CA
Chung-Kuan Cheng, University of California at San Diego, La Jolla, CA
-
1C.1 Analysis of Substrate Thermal Gradient Effects on Optimal Buffer Insertion [p. 44]
-
Amir H. Ajami, Kaustav Banerjee, Massoud Pedram
This paper studies the effects of the substrate thermal gradients
on the buffer insertion techniques. Using a non-uniform temperature-dependent
distributed RC interconnect delay model, the buffer insertion
problem is analyzed and design guidelines are provided to ensure the near-optimality
of the signal performance in the presence of the thermal
gradients. In addition, the effect of temperature-dependent driver resistance
on the buffer insertion is studied. Experimental results show that neglecting
thermal gradients in the substrate and the interconnect lines can result in
non-optimal solutions when using standard buffer insertion techniques and
that these effects intensify with technology scaling.
-
1C.2 A New Algorithm for Routing Tree Construction with Buffer Insertion and Wire Sizing under Obstacle Constraints [p. 49]
-
Xiaoping Tang, Ruiqi Tian, Hua Xiang, D.F. Wong
Buffer insertion and wire sizing are critical in deep submicron VLSI design.
This paper studies the problem of constructing routing trees with simultaneous
buffer insertion and wire sizing in the presence of routing and buffer obstacles. No previous algorithms consider all these factors simultaneously. Previous
dynamic programming based algorithm is first extended to solve the problem.
However, with the size of routing graph increasing and with wire sizing taken
into account, the time and space requirement increases enormously. Then a new
approach is proposed to formulate the problem as a series of graph problems.
The routing tree solution is obtained by finding shortest paths in a series of
graphs. In the new approach, wire sizing can be handled almost without any
additional time and space requirement. Moreover, the time and space
requirement is only polynomial in terms of the size of routing graph. Our
algorithm differs from traditional dynamic programming, and is capable
of addressing the problem of inverter insertion and sink polarity. Both
theoretical and experimental results show that the graph-based algorithm
outperforms the DP-based algorithm by a large margin. We also propose a
hierarchical approach to construct routing tree for a large number of sinks.
-
1C.3 Bus Encoding to Prevent Crosstalk Delay [p. 57]
-
Bret M. Victor, Kurt Keutzer
The propagation delay across long on-chip buses
is increasingly becoming a limiting factor in high-speed
designs. Crosstalk between adjacent wires on the bus may
create a significant portion of this delay. Placing a shield
wire between each signal wire alleviates the crosstalk
problem but doubles the area used by the bus, an
unacceptable consequence when the bus is routed using
scarce top-level metal resources. Instead, we propose to
employ data encoding to eliminate crosstalk delay within a
bus. This paper presents a rigorous analysis of the theory
behind "self-shielding codes", and gives the fundamental
theoretical limits on the performance of codes with and
without memory. Specifically, we find that a 32-bit bus can
be encoded with 40 wires using a code with memory or 46
wires with a memoryless code, in comparison to the 63 wires
required with simple shielding.
Moderators: Rob A. Rutenbar, Carnegie Mellon University, Pittsburgh, PA
Henry Chang, Cadence Design Systems, Inc., San Jose, CA
-
1D.1 Behavioral Modeling of Analog Circuits by Wavelet Collocation Method [p. 65]
-
Xin Li, Xuan Zeng, Dian Zhou, Xieting Ling
In this paper, we develop a wavelet collocation method with nonlinear companding
for behavioral modeling of analog circuits. To construct the behavioral
models, the circuit is first partitioned into building blocks and the
input-output function of each block is then approximated by wavelets. As the
blocks are mathematically represented by sets of simple wavelet basis functions,
the computation cost for the behavioral simulation is significantly reduced.
The proposed method presents several merits compared with those conventional
techniques. First, the algorithm for expanding input-output functions by
wavelets is a general-purpose approach, which can be applied in automatically
modeling of different analog circuit blocks with different structures. Second,
both the small signal effect and the large signal effect are modeled in a
unified formulation, which eases the process of modeling and simulation. Third,
a nonlinear companding method is developed to control the modeling error
distribution. To demonstrate the promising features of the proposed method, a
4th order switched-current filter is employed to build the behavioral model.
-
1D.2 Simulation-Based Automatic Generation of Signomial and Posynomial
Performance Models for Analog Integrated Circuit Sizing [p. 70]
-
Walter Daems, Georges Gielen, Willy Sansen
This paper presents a method to automatically generate posynomial response
surface models for the performance parameters of analog integrated
circuits. The posynomial models enable the use of efficient geometric
programming techniques for circuit sizing and optimization. To
avoid manual derivation of approximate symbolic equations and subsequent
casting to posynomial format, techniques from design of experiments
and response surface modeling in combination with SPICE simulations
are used to generate signomial and posynomial models in an automatic
way. Attention is paid to estimating the relative "goodness-of-fit" of
the generated models. Experimental results allow to assess both the quality
of the generated models as well as the strengths and the limitations of
the presented approach.
Keywords:
Analog Circuit Modeling, Posynomial and Signomial Response Surface Modeling,
Geometric Programming, Design of Experiments
-
1D.3 Power Grid Transient Simulation in Linear Time Based on
Transmission-Line-Modeling Alternating-Direction-Implicit Method [p. 75]
-
Yu-Min Lee, Charlie Chung-Ping Chen
The soaring clocking frequency and integration density demand
robust and stable power delivery to support tens of
millions of transistors switching. To ensure the design quality of
power delivery, extensive transient power grid simulations need to
be performed during design process. However, the traditional circuit
simulation engines are not scaled as well as the complexity of
power delivery, as a result, it often takes a long runtime and huge
memory requirement to simulate a medium size power grid circuit.
In this paper, we develop and present a new efficient transient
simulation algorithm for power distribution. The proposed
algorithm, TLM-ADI, (Transmission-Line-Modeling Alternating-Direction-Implicit),
first models the power delivery structure as
transmission line mesh structure, then solves the transient MNA
matrices by the alternating-direction-implicit method. The proposed
algorithm, with linear runtime and memory requirement, is also
unconditionally stable which ensures that the time-step is not limited
by any stability requirement. Extensive experimental results show
that the proposed algorithm is not only orders of magnitude faster
than SPICE but also extremely memory saving and accurate.
Moderator: Ellen M. Sentovich, Cadence Berkeley Labs., Berkeley, CA
Moderator: Matton Kamon, Conventor, Inc., Cambridge, MA
Moderators: Hamid Savoj, Magma Design Automation, Inc., Cupertino, CA
Diana Marculescu, Carnegie Mellon University, Pittsburgh, PA
-
3A.1 Sequential SPFDs [p. 84]
-
Subarnarekha Sinha, Andreas Kuehlmann, Robert K. Brayton
SPFDs are a mechanism to express flexibility in Boolean networks.
Introduced by Yamashita et al. in the context of FPGA
synthesis [4], they were extended later to general combinational
networks [2]. We introduce the concept of sequential SPFDs
and provide an algorithm to compute them based on a partition
of the state bits. The SPFDs of each component in the partition
are used to generate equivalence classes of states. We provide a
formal relation between the resulting state classification and the
equivalence classes produced by classical state minimization of
completely specified machines [6]. The SPFDs associated with
the state bits can be applied for re-encoding the state space. For
this, we give an algorithm to re-synthesize the sequential circuit
using sequential SPFDs and the new state re-encoding.
-
3A.2 On the Optimization Power of Redundancy Addition and Removal Techniques for
Sequential Circuits [p. 91]
-
Enrique San Milln, Luis Entrena, Jos Alberto Espejo
This paper attempts to determine the capabilities of
existing Redundancy Addition and Removal (SRAR)
techniques for logic optimization of sequential circuits. To
this purpose, we compare this method with the Retiming
and Resynthesis (RaR) techniques. For the RaR case the
set of possible transformations has been established by
relating them to STG transformations by other authors.
Following these works, we first formally demonstrate that
logic transformations provided by RaR are covered by
SRAR as well. Then we also show that SRAR is able to
identify transformations that cannot be found by RaR. This
way we prove the higher potential of the Sequential
Redundancy Addition and Removal over the Retiming and
Resynthesis techniques.
-
3A.3 Placement Driven Retiming with a Coupled Edge Timing Model [p. 95]
-
Ingmar Neumann, Wolfgang Kunz
Retiming is a widely investigated technique for performance
optimization. It performs powerful modifications on a circuit
netlist. However, often it is not clear, whether the predicted
performance improvement will still be valid after placement
has been performed. This paper presents a new retiming
algorithm using a highly accurate timing model taking into
account the effect of retiming on capacitive loads of single
wires as well as fanout systems. We propose the integration
of retiming into a timing-driven standard cell placement
environment based on simulated annealing. Retiming is used
as an optimization technique throughout the whole placement
process. The experimental results show the benefit of the
proposed approach. In comparison with the conventional
design flow based on standard FEAS our approach achieved
an improvement in cycle time of up to 34% and 17% on the
average.
-
3A.4 Solution of Parallel Language Equations for Logic Synthesis [p. 103]
-
Nina Yevtushenko, Tiziano Villa, Robert K. Brayton, Alex Petrenko, Alberto L.
Sangiovanni-Vincentelli
The problem of designing a component that combined with a
known part of a system, conforms to a given overall specification
arises in several applications ranging from logic synthesis
to the design of discrete controllers. We cast the problem as
solving abstract equations over languages. Language equations
can be defined with respect to several language composition operators
such as synchronous composition, and parallel composition,
; conformity can be checked by language containment.
In this paper we address parallel language equations.
Parallel composition arises in the context of modeling delay-insensitive
processes and their environments. The parallel composition
operator models an exchange protocol by which an input
is followed by an output after a finite exchange of internal
signals. It abstracts a system with two components with a single
message in transit, such that at each instance either the components
exchange messages or one of them communicates with
its environment, which submits the next external input to the
system only after the system has produced an external output
in response to the previous input.
We study the most general solutions of the language equation
A ^ X is a member of C
, and define the language operators needed to
express them. Then we specialize such equations to languages
associated with important classes of automata used for modeling
systems, e.g., regular languages and FSM languages. In
particular, for A ^ X is a member of C,
we give algorithms for computing:
the largest FSM language solution, the largest complete solution,
and the largest solution whose composition with
A yields a complete FSM language. We solve also FSM equations under
bounded parallel composition.
In this paper, we give concrete algorithms for computing
such solutions, and state and prove their correctness.
Moderators: Radu Marculescu, Carnegie Mellon University, Pittsburgh, PA
Grant Martin, Cadence Design Systems, Inc., San Jose, CA
-
3B.1 CALiBeR: A Software Pipelining Algorithm for Clustered Embedded VLIW Processors [p. 112]
-
Cagdas Akturan, Margarida F. Jacome
In this paper we describe a software pipelining framework,
CALiBeR (Cluster Aware Load Balancing Retiming
Algorithm), suitable for compilers targeting clustered embedded
VLIW processors. CALiBeR can be effectively used by
embedded system designers to explore different code
optimization alternatives, i.e., can assist the generation of high-quality
customized retiming solutions for desired program
memory size and throughput requirements, while minimizing
register pressure. An extensive set of experimental results is
presented, considering several representative benchmark loop
kernels and a wide variety of clustered datapath configurations,
demonstrating that our algorithm compares favorably with one
the best state-of-the-art algorithms, achieving up to 50%
improvement in performance and up to 47% improvement in
register requirements.
-
3B.2 Software-Assisted Cache Replacement Mechanisms for Embedded Systems [p. 119]
-
Prabhat Jain, Srinivas Devadas, Daniel Engels, Larry Rudolph
We address the problem of improving cache predictability and
performance in embedded systems through the use of software-assisted
replacement mechanisms. These mechanisms require
additional software controlled state information that affects the
cache replacement decision. Software instructions allow a program
to kill a particular cache element, i.e., effectively make the
element the least recently used element, or keep that cache element,
i.e., the element will never be evicted.
We prove basic theorems that provide conditions under which
kill and keep instructions can be inserted into program code, such
that the resulting performance is guaranteed to be as good as or
better than the original program run using the standard LRU policy.
We developed a compiler algorithm based on the theoretical
results that, given an arbitrary program, determines when to perform
software-assisted replacement, i.e., when to insert either a
kill or keep instruction. Empirical evidence is provided that shows
that performance and predictability (worst-case performance) can
be improved for many programs.
-
3B.3 Instruction Generation for Hybrid Reconfigurable Systems [p. 127]
-
Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh, Majid Sarrafzadeh
In this work, we present an algorithm for simultaneous template
generation and matching. The algorithm profiles the graph and
iteratively contracts edges to create the templates. The algorithm is
general and can be applied to any type of graph, including directed
graphs and hypergraphs. We discuss how to target the algorithm
towards the novel problem of instruction generation and selection for a
hybrid (re)configurable systems. In particular, we target the
Strategically Programmable System, which embeds complex
computational units like ALUs, IP blocks, etc. into a configurable
fabric. We argue that an essential compilation step for these systems is
instruction generation, as it is needed to specify the functionality of the
embedded computational units. Additionally, instruction generation
can be used to create soft macros ö tightly sequenced pre-specified
operations placed in the configurable fabric.
Moderators: Majid Sarrafzadeh, University of California, Los Angeles, CA
Rajeev Jayaraman, Xilinx, Inc., San Jose, CA
-
3C.1 Interconnect Resource-Aware Placement for Hierarchical FPGAs [p. 132]
-
Amit Singh, Ganapathy Parthasarathy, Malgorzata Marek-Sadowska
In this paper, we utilize Rent's rule as an empirical measure for efficient
clustering and placement of circuits on hierarchical FPGAs. We show
that careful matching of design complexity and architecture resources
of hierarchical FPGAs can have a positive impact on the overall device
area. We propose a circuit placement algorithm based on Rentās parameter
and show that our clustering and placement techniques can improve
the overall device routing area by as much as 21% for the same array
size, when compared to a state-of-art FPGA placement and routing tool.
-
3C.2 A Router for Symmetrical FPGAs Based on Exact Routing Density Evaluation [p. 137]
-
Nak-Woong Eum, Taewhan Kim, Chong Min Kyung
This paper presents a new performance and routability driven routing
algorithm for symmetrical array based field-programmable gate
arrays (FPGAs). A key contribution of our work is to overcome
one essential limitation of the previous routing algorithms: inaccurate
estimations of routing density which were too general for
symmetrical FPGAs. To this end, we derive an exact routing density
calculation that is based on a precise analysis of the structure
(switch block) of symmetrical FPGAs, and utilize it consistently in
global and detailed routings. With an introduction of the proposed
accurate routing metrics, we design a new routing algorithm called
a cost-effective net-decomposition based routing which is fast, and
yet produces remarkable routing results in terms of both routability
and path/net delays. We performed an extensive experiment to
show the effectiveness of our algorithm based on the proposed cost
metrics.
-
3C.3 A Search-Based Bump-and-Refit Approach to Incremental Routing for ECO Applications in FPGAs [p. 144]
-
Vinay Verma, Shantanu S. Dutt
Incremental physical CAD is encountered frequently in the so-called
engineering change order (ECO) process in which design changes
are made typically late in the design process in order to correct logical
and/or technological problems in the circuit. As far as routing is concerned,
in order to capitalize on the enormous resources and time already
spent on routing the circuit, and to meet time-to-market requirements, it
is desirable to re-route only the ECO-affected portion of the circuit, while
minimizing any routing changes in the much larger unaffected part of
the circuit. Incremental re-routing also needs to be fast and to effectively
use available routing resources. In this paper, we develop a complete
incremental routing methodology for FPGAs using a novel approach called
bump and refit (B&R); B&R was initially proposed in [4] in the much
simpler context of extending some nets by a segment (for the purpose of
fault tolerance) for FPGAs with simple i-to-i switchboxes. Here we significantly
extend this concept to global and detailed incremental routing for
FPGAs with complex switchboxes such as those in Lucentās ORCA and
Xilinxās Virtex series. We also introduce new concepts such as B&R cost
estimation during global routing, and determination of the optimal subnet
set to bump for each bumped net, which we obtain using an efficient
dynamic programming formulation. The basic B&R idea in our algorithms
is to re-arrange some portions of some existing nets on other tracks
within their current channels to find valid routings for the incrementally
changed circuit without requiring any extra routing resources (i.e., completely
unused tracks), and with little effect on the electrical properties of
existing nets.
We have developed optimal and near-optimal algorithms (called Sub-sec
B&R and Subnet B&R, respectively) to find incremental routing solutions
using the B&R paradigm in complex FPGAs. We implemented
these algorithms for Lucentās ORCA-2C FPGA, and compared our algorithms
to two recent incremental routing techniques, Standard and
Rip-up&Reroute, and to Lucentās A PAR routing tool. Experimental
results show that our incremental routers perform very well for ECO applications.
Firstly, B&R is 10 to 20 times faster than complete re-routing
using A PAR. Further, the B&R incremental routers are nearly 27% faster
and the new nets have nearly 10% smaller lengths than in previous incremental
techniques. Also, the B&R routers do not change either the
lengths or topologies of existing nets, a significant advantage in ECO applications,
in contrast to Rip-up&Reroute which increases the length
of ripped up nets by an average of 8.75% to 13.6%.
Moderators: David D. Ling, IBM Corp. TJ Watson Research Center,
Yorktown Heights, NY
Ken Kundert, Cadence Design Systems, Inc., San Jose, CA
-
3D.1 Area Minimization of Power Distribution Network Using Efficient Nonlinear
Programming Techniques [p. 153]
-
Xiaohai Wu, Xianlong Hong, Yici Cai, C.K. Cheng, Jun Gu, Wayne Dai
This paper deals with area minimization of
power distribution network for VLSIs. A new algorithm based
on efficient nonlinear programming techniques is presented to
solve this problem. The experiment results prove that this
algorithm has achieved the objects that minimize the area of
power/ground networks with higher speed.
-
3D.2 Coupled Analysis of Electromigration Reliability and Performance in ULSI Signal Nets [p. 158]
-
Kaustav Banerjee, Amit Mehrotra
In deep submicron VLSI circuits, interconnect reliability due to electromigration
and thermal effects is fast becoming a serious design issue
particularly for long signal lines. This paper presents for the first time a
rigorous coupled analysis of AC electromigration that are prevalent in signal
lines and thermal effects arising due to Joule heating of the wires. The
analysis is applied to study the effect of technology scaling using ITRS
data, wherein the effects of increasing interconnect (Cu) resistivity with
line dimensions and the effect of a finite barrier metal thickness have been
included. Finally, we have also quantified the reliability implications for
minimum sized vias in optimally buffered signal nets. Our analysis suggests
that for the optimally buffered interconnects, while the maximum
current density in the line remains limited by the performance, the current
density in the vias exceeds the reliability limits and therefore requires
careful consideration in the physical design process flow.
-
3D.3 Compact Modeling and SPICE-Based Simulation for Electrothermal
Analysis of Multilevel ULSI Interconnects [p. 165]
-
TingYen Chiang, Kaustav Banerjee, Krishna C. Saraswat
This paper presents both compact analytical models and fast
SPICE based 3-D electro-thermal simulation methodology to
characterize thermal effects due to Joule heating in high
performance Cu/low-k interconnects under steady-state and
transient stress conditions. The results demonstrate excellent
agreement with experimental data and those using Finite Element
(FE) thermal simulations (ANSYS). The effect of vias, as
additional heat sinking paths to alleviate the temperature rise in
the metal wires, is included in our analysis to provide more
accurate and realistic thermal diagnosis. It shows that the
effectiveness of vias in reducing the temperature rise in
interconnects is highly dependent on the via separation and the
dielectric materials used. The analytical model is then applied to
estimate the temperature distribution in multi-level interconnects.
In addition, we discuss the possibility that, under the impact of
thermal effects, the performance improvement expected from the
use of low-k dielectric materials may be degraded. Furthermore,
thermal coupling between wires is evaluated and found to be
significant. Finally, the impact of metal wire aspect ratio on
interconnect thermal characteristics is discussed.
Moderator: Andreas Kuehlmann, Cadence Berkeley Laboratories, Berkeley, CA
Panelists: Robert W. Dutton, Paul Franzon, Seth C. Goldstein,
Philip Luekes, Eric Parker, Thomas N. Theis
-
Will Nanotechnology Change the Way We Design and Verify Systems? [p. 174]
-
During the past few decades advances in semiconductor technology have had only modest implications for
the design and verification of integrated systems. While "deep-submicron" effects created new challenges
for guaranteeing electrical integrity and necessitated the bundling of traditionally separated design steps
such as logic synthesis and physical layout, the general integrated circuit design flow did not change. But
what will happen when semiconductor technology hits the wall? Which of the currently futuristic
nanotechnologies will step in and how will they change the way we design and verify systems? Will we
just replace the implementation of the NAND-gate and D-flipflop and then do business as usual? Or do
these technologies force us to abandon convenient hierarchical modeling of electrical, logical, and system
levels? The panel will discuss the exciting opportunities of nanotechnologies and their implication for the
design and verification of future systems.
Moderators: Masahiro Fujita, University of Tokyo, Tokyo, Japan
Vigyan Singal, Tempus Fujit, Berkeley, CA
-
4A.1 Min-Area Retiming on Dynamic Circuit Structures [p. 176]
-
Jason Baumgartner, Andreas Kuehlmann
In this paper we present two techniques for improving min-area
retiming that combine the actual register minimization with combinational
optimization. First, we discuss an on-the-fly retiming approach
based on a sequential AND/INVERTER/REGISTER graph.
With this method the circuit structure is sequentially compacted
using a combination of register ćdraggingä and AND vertex hashing.
Second, we present an extension of the classical retiming formulation
that allows an optimal sharing of fanin registers of AND
clusters, similar to traditional fanout register sharing. The combination
of both techniques is capable of minimizing the circuit size
beyond that possible with a standard Leiserson and Saxe retiming
approach on a static netlist structure. Our work is primarily aimed
at optimizing the performance of reachability-based verification
methods. However, the presented techniques are equally applicable
to sequential redundancy removal in technology-independent
logic synthesis. A large set of experiments using benchmark and
industrial circuits demonstrate the effectiveness of the described
techniques.
-
4A.2 Verification of Integer Multipliers on the Arithmetic Bit Level [p. 183]
-
Dominik A. Stoffel, Wolfgang Kunz
One of the most severe short-comings of currently available equivalence
checkers is their inability to verify integer multipliers. In this
paper, we present a bit level reverse-engineering technique that can
be integrated into standard equivalence checking flows. We propose
a Boolean mapping algorithm that extracts a network of half adders
from the gate netlist of an addition circuit. Once the arithmetic bit
level representation of the circuit is obtained, equivalence checking
can be performed using simple arithmetic operations. Experimental
results show the promise of our approach.
-
4A.3 Induction-Based Gate-Level Verification of Multipliers [p. 190]
-
Ying-Tsai Chang, Kwang-Ting(Tim) Cheng
We propose a method based on unrolling the inductive definition
of binary number multiplication to verify gate-level implementations
of multipliers. The induction steps successively reduce
the size of the multiplier under verification. Through induction, the
verification of an n-bit multiplier is decomposed into n equivalence
checking problems. The resulting equivalence checking problems
could be significantly sped up by simple structural analysis. This
method could be generalized to the verification of more general
arithmetic circuits and the equivalence checking of complex data-path.
Moderators: Wayne Wolf, Mediaworks Technology, Schaumberg, IL
Preeti Panda, Synopsys, Inc., Mountain View, CA
-
4B.1 An Assembly-Level Execution-Time Model for Pipelined Architectures [p. 195]
-
Giovanni Beltrame, Carlo Brandolese, William Fornaciari, Fabio Salice, Donatella Sciuto, Vito Trianni
The aim of this work is to provide an elegant and accurate
static execution timing model for 32-bit microprocessor instruction
sets, covering also interöinstruction effects. Such
effects depend on the processor state and the pipeline behavior,
and are related to the dynamic execution of assembly
code. The paper proposes a mathematical model of the
delays deriving from instruction dependencies and gives a
statistical characterization of such timing overheads. The
model has been validated on a commercial architecture, the
Intel486, by means of timing analysis of a set of benchmarks,
obtaining an error within 5%. This model can be seamlessly
integrated with a static energy consumption model in order
to obtain precise software power and energy estimations.
-
4B.2 Improving Memory Energy Using Access Pattern Classification [p. 201]
-
Mahut T. Kandemir, Ugur Sezer, Victor Delaluz
In this paper, we propose a data-driven strategy to optimize the
memory energy consumption in a banked memory system. Our
compiler-based strategy modifies the original execution order
of loop iterations in array-dominated applications to increase
the length of the time period(s) in which memory banks are
idle (i.e., not accessed by any loop iteration). To achieve this,
it first classifies loop iterations according to their bank access
patterns and then, with the help of a polyhedral tool, tries
to bring the iterations with similar bank access patterns close
together. Increasing the idle periods of memory banks brings
two major benefits; first, it allows us to place more memory
banks into low-power operating modes, and second, it enables
us to use a more aggressive (i.e., more energy saving) operating
mode for a given bank. Our strategy has been evaluated using
seven array-dominated applications on both a cacheless system
and a system with cache memory. Our results indicate that
the strategy is very successful in reducing the memory system
energy, and improves the memory energy by as much as 34%
on the average.
-
4B.3 System-Level Power/Performance Analysis of Portable Multimedia Systems
Communicating over Wireless Channels [p. 207]
-
Radu Marculescu, Amit Nandi, Luciano Lavagno, Alberto Sangiovanni-Vincentelli
This paper presents a new methodology for system-level
power and performance analysis of wireless multimedia systems.
More precisely, we introduce an analytical approach
based on concurrent processes modeled as Stochastic
Automata Networks (SANs) that can be effectively used to
integrate power and performance metrics in system-level
design. We show that 1) under various input traces and wireless
channel conditions, the average-case behavior of a multimedia
system consisting of a video encoder/decoder pair is
characterized by very different probability distributions and
power consumption values and 2) in order to identify the
best trade-off between power and performance figures, one
must take into consideration the entire environment (i.e.,
encoder, decoder, and channel) for which the system is being
designed. Compared to using simulation, our analytical
technique reduces the time needed to find the steady-state
behavior by orders of magnitude, with some limited loss in
accuracy compared to the exact solution. We illustrate the
potential of our methodology using the MPEG-2 video as the
driver application.
Moderators: Andrew B. Kahng, University of California at San Diego,
La Jolla, CA
Malgorzata Marek-Sadowska, University of California, Santa Barbara, CA
-
4C.1 Congestion Aware Layout Driven Logic Synthesis [p. 216]
-
Thomas Kutzschebauch, Leon Stok
In this paper, we present novel algorithms that
effectively combine physical layout and early logic
synthesis to improve overall design quality. In addition, we
employ partitioning and clustering algorithms to achieve
faster turn around times.
With the increasing complexity of designs, the
traditional separation of logic and physical design leads to
sub-optimal results as the cost functions employed during
logic synthesis do not accurately represent physical design
information. While this problem has been addressed
extensively, the existing solutions apply only simple
synthesis transforms during physical layout and are
generally unable to reverse decisions made during logic
minimization and technology mapping, that have a major
negative impact on circuit structure.
In our novel approach, we propose congestion aware
algorithms for layout driven decomposition and technology
mapping, two of the steps that affect congestion the most
during logic synthesis, to effectively decrease wire length
and improve congestion. In addition, to improve design
turn-around-time and handle large designs, we present an
approach in which synthesis partitioning and placement
clustering co-exist, reflecting the different characteristics
of logical and physical domain.
-
4C.2 Addressing the Timing Closure Problem by Integrating Logic Optimization and Placement [p. 224]
-
Wilsin Gosti, Sunil Khatri, Alberto Sangiovanni-Vincentelli
Timing closure problems occur when timing estimates computed during
logic synthesis do not match with timing estimates computed from
the layout of the circuit. In such a situation, logic synthesis and layout
synthesis are iterated until the estimates match. The number of such
iterations is becoming larger as technology scales. Timing closure
problems occur mainly due to the difficulty in accurately predicting
interconnect delay during logic synthesis.
In this paper, we present an algorithm that integrates logic synthesis
and global placement to address the timing closure problem. We introduce
technology independent algorithms as well as technology dependent
algorithms. Our technology independent algorithms are based
on the notion of "wire-planning". All these algorithms interleave their
logic operations with local and incremental/full global placement, in
order to maintain a consistent placement while the algorithm is run.
We show that by integrating logic synthesis and placement, we avoid
the need to predict interconnect delay during logic synthesis. We
demonstrate that our scheme significantly enhances the predictability
of wire delays, thereby solving the timing closure problem. This is
the main result of our paper. Our results also show that our algorithms
result in a significant reduction in total circuit delay. In addition, our
technology independent algorithms result in a significant circuit area
reduction.
-
4C.3 An Algorithm for Simultaneous Pin Assignment and Routing [p. 232]
-
Hua Xiang, Xiaoping Tang, D.F. Wong
Macro-block pin assignment and routing are important tasks in physical
design planning. Existing algorithms for these problems can be classified
into two categories: 1) a two-step approach where pin assignment is followed
by routing, and 2) a net-by-net approach where pin assignment and routing
for a single net are performed simultaneously. None of the existing algorithms
is "exact" in the sense that the algorithm may fail to route all nets even
though a feasible solution exists. This remains to be true even if only 2-pin
nets between two blocks are concerned. In this paper, we present the first
polynomial-time exact algorithm for simultaneous pin assignment and routing
for 2-pin nets from one block (source block) to all other blocks. In addition
to finding a feasible solution whenever one exists, it guarantees to find a
pin-assignment/routing solution with minimum cost alpha.W+beta.V, where W
is the total wirelength and V is the total number of vias. Our algorithm has
various applications: 1) It is suitable in ECO (Engineering Change Order)
situations where a designer wants to incrementally modify the existing solutions
instead of redoing everything after a design change. 2) Given any pin
assignment and routing solution obtained by any existing method, our algorithm
can be used to increase the number of routed nets and reduced the routing cost.
Furthermore, it provides an efficient algorithm for the pin assignment and
routing problem of all blocks. The method is applicable to both global and
detailed routing with arbitrary routing obstacles on multiple layers.
Experimental results demonstrate its efficiency and effectiveness.
Moderators: Eli Chiprout, Intel Corp., Chandler, AZ
Andreas C. Cangellaris, University of Illinois, Urbana, IL
-
4D.1 Techniques for Including Dielectrics when Extracting Passive
Low-Order Models of High Speed Interconnect [p. 240]
-
Luca Daniel, Alberto Sangiovanni-Vincentelli, Jacob K. White
Interconnect structures including dielectrics can be modeled by an
integral equation method using volume currents and surface charges
for the conductors, and volume polarization currents and surface
charges for the dielectrics. In this paper we describe a mesh analysis
approach for computing the discretized currents in both the
conductors and the dielectrics. We then show that this fully mesh-based
formulation can be cast into a form using provably positive
semidefinite matrices, making for easy application of Krylov-subspace
based model-reduction schemes to generate accurate guaranteed
passive reduced-order models. Several printed circuit board
examples are given to demonstrate the effectiveness of the strategy.
-
4D.2 A Convex Programming Approach to Positive Real Rational Approximation [p. 245]
-
Carlos P. Coelho, Joel R. Phillips, Luis M. Silveira
As system integration evolves and tighter design constraints must
be met, it becomes necessary to account for the non-ideal behavior
of all the elements in a system. Certain devices common in high-frequency
integrated circuit applications, such as spiral inductors,
SAW filters, etc, are often described and studied in the frequency
domain. Models take the form of frequency domain data obtained
through measurement or through physical simulation. Usually the
available data is sampled, incomplete, noisy, and covers only a finite
range of the spectrum.
In this paper we present a methodology for generating guaranteed
passive time-domain models of frequency-described subsystems.
The methodology presented is based on convex programming
based algorithms for fixed denominator system identification. The
algorithm is guaranteed to produce a passive system model that is
optimal in the sense of having minimum weighted square error in
the frequency band of interest over all models with a prescribed set
of system poles. An incremental-fitting reformulation of the problem
is also introduced that trades optimality for efficiency while
still guaranteeing passivity. Results of the application of the proposed
methodologies to the modeling of a variety of subsystems
are presented and discussed.
-
4D.3 A Trajectory Piecewise-Linear Approach to Model Order Reduction and
Fast Simulation of Nonlinear Circuits and Micromachined Devices [p. 252]
-
Michal Rewienski, Jacob K. White
In this paper we present an approach to the nonlinear model reduction
based on representing the nonlinear system with a piecewise-linear system
and then reducing each of the pieces with a Krylov projection. However,
rather than approximating the individual components as piecewise-linear
and then composing hundreds of components to make a system
with exponentially many different linear regions, we instead generate a
small set of linearizations about the state trajectory which is the response
to a Ītraining inputā. Computational results and performance data are
presented for a nonlinear circuit and a micromachined fixed-fixed beam
example. These examples demonstrate that the macromodels obtained
with the proposed reduction algorithm are significantly more accurate
than models obtained with linear or the recently developed quadratic
reduction techniques. Finally, it is shown that the proposed technique
is computationally inexpensive, and that the models can be constructed
"on-the-fly", to accelerate simulation of the system response.
Moderators: Rolf Ernst, Technical University of Braunschweig,
Braunschweig, Germany
Francky Catthoor, IMEC, Leuven, Belgium
-
5A.1 Low Power System Scheduling and Synthesis [p. 259]
-
Niraj K. Jha
Many scheduling techniques have been presented recently
which exploit dynamic voltage scaling (DVS) and
dynamic power management (DPM) for both uniprocessors
and distributed systems, as well as both real-time
and non-real-time systems. While such techniques are
power-aware and aim at extending battery lifetimes for
portable systems, they need to be augmented to make
them battery-aware as well. We will survey such power-aware
and battery-aware scheduling algorithms. Also,
system synthesis algorithms for real-time systems-on-a-chip
(SOCs), distributed and wireless client-server embedded
systems, etc., have begun optimizing power consumption
in addition to system price. We will survey
such algorithms as well, and point out some open problems.
-
5A.3 Integral Design Representations for Embedded Systems [p. 264]
-
Lothar Thiele
Modern embedded computing systems tend to be heterogeneous assemblages
of concurrent subsystems, typically described in different
languages and semantics which are well established in the various
application fields. For example, the specification of the functional
and timing behavior necessitates a mixture of different basic models
of computation and communication which come from transformative
or reactive domains. There is little hope that a single language
will replace this heterogeneous set of languages. In fact, experiments
with system specification languages show that there is not a
unique universal specification language to support the whole life cycle.
A similar problem occurs when reused components shall be
integrated, possibly described in another language and incompletely
documented. Examples would be components of other companies
or "legacy code". The lack of coherency of the different languages,
methods and tools is a substantial obstacle on the way to higher design
productivity and design quality. A design process must be able
to bridge the semantic differences for verification and synthesis and
should account for limited knowledge of system properties.
The embedded tutorial will describe approaches which allow for
the common representation of different languages and incomplete
specifications. In particular, recent results on the use of internal design
representations targeted to scheduling and design space exploration
will be reviewed. Here, the information useful for methods
like allocation of resources, partitioning the design and scheduling
must be estimated or extracted from the various input languages and
mapped onto internal representations which describe properties of
the subsystems and their coordination. Design methods work on
these internal representations and eventually refine them by adding
components and reducing non-determinism.
-
5A.4 Optimisation Problems for Dynamic Concurrent Task-Based Systems [p. 265]
-
D. Verkest, P. Yang, C. Wong, P. Marchal
One of the most critical bottlenecks in many novel multi-media applications is their very dynamic
concurrent behaviour. This is especially true because of the quality-of-service (QoS) aspects of these
applications. Prominent examples of this can be found in the recent MPEG4 and JPEG2000 standards
and especially the new MPEG21 standard. In order to deal with these dynamic issues where tasks and
complex data types are created and deleted at run-time based on non-deterministic events, a novel system
design paradigm is required. Because of the high computation and communicating requirement from
these kind of applications, multi-processor SoC (system on chip) is more and more accepted as a solution
(eg., TriMedia), which is also promised by the advance of the processing technology. It is different
from traditional general-purpose computers because it is application specific and it is an embedded
system, which means costs like energy consumption are of major concern. The task scheduling on such
a multiprocessor, embedded multi-media systems forms a real challenge.
This part of the tutorial will focus on the new requirements in system-level synthesis. In particular a
ätask concurrency managementä problem formulation will be proposed, with special focus on the formal
definition of the most crucial optimization problems. The concept of Pareto curve based exploration is
crucial in these formulations.
Moderator: Georges Gielen, Katholieke University, Leuven, Belgium
Domine Leenaerts, Rob A. Rutenbar, Georges Gielen
-
-
This tutorial paper addresses the problems and solutions that are
posed by the design of mixed-signal integrated systems on chip
(SoC). These include problems in mixed-signal design methodologies
and flows, problems in analog design productivity, as
well as open problems in analog, mixed-signal and RF design,
modeling and verification tools. The tutorial explains the problems
that are posed by these mixed-signal/RF SoC designs, describes
the solutions and their underlying methods that exist
today and outlines the challenges that still remain to be solved
at present. In the first part the design of analog and mixed-signal
circuits is addressed, while the second part focuses on the specific
problems raised by RF wireless circuits.
Moderators: Alan J. Hu, University of British Columbia, Vancouver, BC, Canada
Rajeev Ranjan, Real Intent, Santa Clara, CA
-
6A.1 Efficient Conflict Driven Learning in Boolean Satisfiability Solver [p. 279]
-
Lintao Zhang, Conor Madigan, Matthew Moskewicz, Sharad Malik
One of the most important features of current state-of-the-art
SAT solvers is the use of conflict based backtracking and
learning techniques. In this paper, we generalize various
conflict driven learning strategies in terms of different
partitioning schemes of the implication graph. We re-examine
the learning techniques used in various SAT solvers and
propose an array of new learning schemes. Extensive
experiments with real world examples show that the best
performing new learning scheme has at least a 2X speedup
compared with learning schemes employed in state-of-the-art
SAT solvers.
-
6A.2 Partition-Based Decision Heuristics for Image Computation Using SAT and BDDs [p. 286]
-
Aarti Gupta, Zijiang Yang, Pranav N. Ashar, Lintao Zhang, Sharad Malik
Methods based on Boolean satisfiability (SAT) typically use a Conjunctive
Normal Form (CNF) representation of the Boolean formula,
and exploit the structure of the given problem through use
of various decision heuristics and implication methods. In this paper,
we propose a new decision heuristic based on separator-set induced
partitioning of the underlying CNF graph. It targets those
variables whose choice generates clause partitions with disjoint
variable supports. This can potentially improve performance of
SAT applications by decomposing the problem dynamically within
the search. In the context of a recently proposed image computation
method combining SAT and BDDs, this results in simpler
BDD subproblems. We provide algorithms for CNF partitioning
ö one based on a clause-variable dependency matrix, and another
based on standard hypergraph partitioning techniques, and also for
the use of partitioning information in decision heuristics for SAT.
We demonstrate the effectiveness of our proposed partition-based
heuristic with practical results for reachability analysis of benchmark
sequential circuits.
-
6A.3 Non-linear Quantification Scheduling in Image Computation [p. 293]
-
Pankaj Chauhan, Edmund M. Clarke, Somesh Jha, Jim Kukula, Tom Shiple,
Helmut Veith, Dong Wang
Computing the set of states reachable in one step from a given set of
states, i.e. image computation, is a crucial step in several symbolic
verification algorithms, including model checking and reachability
analysis. So far, the best methods for quantification scheduling in
image computation, with a conjunctively partitioned transition relation,
have been restricted to a linear schedule. This results in a
loss of flexibility during image computation. We view image computation
as a problem of constructing an optimal parse tree for the
image set. The optimality of a parse tree is defined by the largest
BDD that is encountered during the computation of the tree. We
present dynamic and static versions of a new algorithm, VarScore,
which exploits the flexibility offered by the parse tree approach to
the image computation. We show by extensive experimentation that
our techniques outperform the best known techniques so far.
Moderators: Sandeep Shukla, University of California, Irvine, CA
Francky Catthoor, IMEC, Leuven, Belgium
-
6B.1 Symbolic Algebra and Timing Driven Data-flow Synthesis [p. 300]
-
Armita Peymandoust, Giovanni De Micheli
The growing market of multi-media applications has
required the development of complex ASICs with
significant data-path portions. Unfortunately, most high-level
synthesis tools and methods cannot automatically
synthesize data paths such that complex arithmetic library
blocks are intelligently used. Symbolic computer algebra
has been previously used to automate mapping data flow
into a minimal set of complex arithmetic components. In
this paper, we present extensions to the previous methods
in order to find the minimal critical path delay (CPD)
mapping. A new algorithm is proposed that incorporates
symbolic manipulations such as tree-height-reduction,
factorization, expansion, and Horner transformation.
Such manipulations are used as guidelines in initial
library element selection. Furthermore, we demonstrate
how substitution can be used for multi-expression
component sharing and critical path delay optimization.
-
6B.2 Application-Driven Processor Design Exploration for Power-Performance
Trade-off Analysis [p. 306]
-
Diana Marculescu, Anoop Iyer
This paper presents an efficient design exploration
environment for high-end core processors. The heart of the proposed
design exploration framework is a two-level simulation engine that
combines detailed simulation for critical portions of the code with
fast profiling for the rest. Our two-level simulation methodology
relies on the inherent clustered structure of application programs
and is completely general and applicable to any microarchitectural
power/performance simulation engine. The proposed simulation
methodology is 3-17X faster, while being sufficiently accurate
(within 5%) when compared to the fully detailed simulator. The
design exploration environment is able to vary different
microarchitectural configurations and find the optimal one as far as
energyxdelay product is concerned in a matter of minutes. The
parameters that are found to affect drastically the core processor
power/performance metrics are issue width, instruction window size,
and pipeline depth, along with correlated clock frequency. For very
high-end configurations for which balanced pipelining may not be
possible, opportunities for running faster stages at lower voltage
exist. In such cases, by using up to 3 voltage levels, the energyxdelay
product is reduced by 23-30% when compared to the single voltage
implementation.
-
6B.3 A System for Synthesizing Optimized FPGA Hardware from MATLAB [p. 314]
-
Malay Haldar, Anshuman Nayak, Alok Choudhary, Prith Banerjee
Efficient high level design tools that can map behavioral descriptions to
FPGA architectures are one of the key requirements to fully leverage FPGA
for high throughput computations and meet time-to-market pressures. We
present a compiler that takes as input algorithms described in MATLAB
and generates RTL VHDL. The RTL VHDL then can be mapped to FPGAs
using existing commercial tools. The input application is mapped to multiple
FPGAs by parallelizing the application and embedding communication
and synchronization primitives automatically. Our compiler infers the minimum
number of bits required to represent the variable through a precision
analysis framework. The compiler can leverage optimized IP cores to enhance
the hardware generated. The compiler also exploits parallelism in
the input algorithm by pipelining in the presence of resource constraints.
We demonstrate the utility of the compiler by synthesizing hardware for
a couple of signal/image processing algorithms and comparing them with
manually designed hardware.
-
6B.4 Behavior-to-Placed RTL Synthesis with Performance-Driven Placement [p. 320]
-
Daehong Kim, Jinyong Jung, Sunghyun Lee, Jinhwan Jeon, Kiyoung Choi
Interconnect delay should be considered together with computation
delay during architectural synthesis in order to achieve timing
closure in deep submicrometer technology. In this paper, we propose
an architectural synthesis technique for distributed-register
architecture, which separates interconnect delay for data transfer
from component delay for computation. The technique incorporates
performance-driven placement into the architectural synthesis
to minimize performance overhead due to interconnect delay.
Experimental results show that our methodology achieves performance
improvement of up to 60% and 22% on the average.
Moderators: Cheng-Kok Koh, Purdue University, West Lafayette, IN
Tong Gao, Monterey Design Systems, Inc., Sunnyvale, CA
-
6C.1 Formulae and Applications of Interconnect Estimation Considering
Shield Insertion and Net Ordering [p. 327]
-
James D.Z Ma, Lei He
It has been shown recently that simultaneous shield insertion and
net ordering (called SINO/R as only random shields are used) provides
an area-efficient solution to reduce the RLC noise. In this
paper, we first develop simple formulae with errors less than 10%
to estimate the number of shields in the min-area SINO/R solution.
In order to accommodate pre-routed P/G wires that also serve as
shields, we then formulate two new SINO problems: SINO/SPR
and SINO/UPG, and propose effective and efficient two-phase algorithms
to solve them. Compared to the existing dense wiring
fabric scheme, the resulting SINO/SPR and SINO/UPG schemes
maintain the regularity of the P/G structure, have negligible penalty
on noise and delay variation, and reduce the total routing area by
up to 42% and 36%, respectively. Further, we develop various
pre-layout estimation formulae for shielding areas and optimal P/G
structures under different routing styles. These formulae can be
readily used to guide global routing and high-level design decisions.
-
6C.2 Hybrid Structured Clock Network Construction [p. 333]
-
Haihua Su, Sachin Sapatnekar
This paper hierarchically constructs a hybrid mesh/tree
clock network structure consisting of overlying zero-skew clock
meshes, with underlying zero-skew clock trees originating from
the mesh nodes. We propose a mesh construction procedure,
which guarantees zero skew under the Elmore delay model,
using a simple and efficient linear programming formulation.
Buffers are inserted to reduce the transition time (or rise time).
As a post-processing step, wire width optimization under an
accurate higher-order delay metric is performed to further
minimize the transition time and propagation delay/skew.
Experimental results show that the hybrid mesh/tree construction
scheme can provide smaller propagation delay and transition
time than a comparable clock tree.
-
6C.3 CASh: A Novel "Clock as Shield" Design Methodology for Noise Immune
Precharge-Evaluate Logic [p. 337]
-
Yonghee Im, Kaushik Roy
In gigascale integrated circuits (GSI), interconnects
are expected to play a more dominant role in circuit performance
than transistor cells. The circuit performance is affected by signal
integrity as cross-talk becomes more significant with the scaling
of feature sizes. Many attempts have been made to improve
noise immunity, but all require the sacrifice of speed as a trade-off,
especially in dynamic circuits. Avoiding noise problems while
maintaining the desired speed would involve increased wire spacing
or extensive shielding, both of which are unfavorable due to
demands for high density and a relatively higher cost of wires
in current process technologies. We propose a novel methodology
in which clock lines are used as shielding wires to reduce
cross-talk effects in domino circuits, thereby minimizing the possibility
of functional failures. In addition, this method provides
another benefit: a small buffer size is required for driving a long
interconnect for iso-noise immunity. Since clock lines, which are
always required in domino circuits, are used to shield signal lines,
speed penalty and area overhead which are drawbacks of previous
work can be avoided. This design methodology CASh (Clock As
Shielding) demonstrates the superiority over conventional methods.
HSPICE simulations on a 2-input domino AND gate and 4
and 8-bit full adders designed in CASh show higher noise immunity
over conventional design.
Moderators: Koen Lampaert, Mindspeed Technology, Newport Beach, CA
Henry Chang, Cadence Design Systems, Inc., San Jose, CA
-
6D.1 The Sizing Rules Method for Analog Integrated Circuit Design [p. 343]
-
Helmut E. Graeb, Stephan Zizala, Josef Eckmueller, Kurt Antreich
This paper presents the sizing rules method for analog CMOS circuit
design that consists of: first, the development of a hierarchical
library of transistor pair groups as basic building blocks for analog
CMOS circuits; second, the derivation of a hierarchical generic
list of constraints that must be satisfied to guarantee the function
of each block and its reliability with respect to physical effects;
and third, the development of an automatic recognition of building
blocks in a circuit schematic. The sizing rules method efficiently
captures design knowledge on the technology-specific level of transistor
pair groups. This reduces the preparatory modeling effort for
analog circuit synthesis. Results of industrial applications to circuit
sizing, design centering, response surface modeling and analog
placement show the significance of the sizing rules method. Sizing
rules especially make sure that automatic circuit sizing and design
centering lead to technically meaningful and robust results.
-
6D.2 ASF: A Practical Simulation-Based Methodology for the Synthesis of Custom Analog Circuits [p. 350]
-
Michael J. Krasnicki, Rodney Phelps, James R. Hellums, Mark McClung,
Rob A. Rutenbar, L. Richard Carley
This paper describes ASF, a novel cell-level analog
synthesis framework that can size and bias a given circuit topology
subject to a set of performance objectives and a manufacturing
process. To manage complexity and time-to-market, SoC designs
require a high level of automation and reuse. Digital methodologies
are inapplicable to analog IP, which relies on tight control of low-level
device and circuit properties that vary widely across
manufacturing processes. This analog synthesis solution automates
these tedious, technology specific aspects of analog design. Unlike
previously proposed approaches, ASF extends the prevalent
"schematic and SPICE" methodology used to design analog and
mixed-signal circuits. ASF is topology and technology independent
and can be easily integrated into a commercial schematic capture
design environment. Furthermore, ASF employs a novel numerical
optimization formulation that incorporates classical downhill
techniques into stochastic search. ASF consistently produces results
comparable to expert manual design with 10x fewer candidate
solution evaluations than previously published approaches that rely
on traditional stochastic optimization methods.
-
6D.3 A Layout-Aware Synthesis Methodology for RF Circuits [p. 358]
-
Peter J. Vancorenland, Geert Van der Plas, Michiel Steyaert, Georges Gielen, Willy Sansen
In this paper a layout-aware RF synthesis methodology is presented.
The methodology combines the power of a differential evolution algorithm
with cost function response modeling and integrated layout generation to
synthesize RF Circuits ef ciently, taking into account all layout
parasitics during the circuit optimization. The proposed approach has
successfully been applied to the design of a high-performance
downconverter mixer circuit, proving the effectiveness of the
implemented design methodology.
Moderators: Sujit Dey, University of California at San Diego, La Jolla, CA
Yervant Zorian, LogicVision, Inc., San Jose, CA
-
7A.1 On Identifying Don't Care Inputs of Test Patterns for Combinational
Circuits [p. 364]
-
Seiji Kajihara, Kohei Miyase
Given a test set for stuck-at faults, some of primary input values
may be changed to opposite logic values without losing fault coverage.
We can regard such input values as donāt care (X). In this
paper, we propose a method for identifying X inputs of test vectors
in a given test set. While there are many combinations of X
inputs in the test set generally, the proposed method finds one including
X inputs as many as possible, by using fault simulation
and procedures similar to implication and justification of ATPG
algorithms. Experimental results for ISCAS benchmark circuits
show that approximately 66% of inputs of uncompacted test sets
could be X in average. Even for compacted test sets, the method
found that approximately 47% of inputs are X. Finally, we discuss
how logic values are reassigned to the identified X inputs where
several applications exist to make test vectors more desirable.
-
7A.2 REDI: An Efficient Fault Oriented Procedure to Identify Redundant
Faults in Combinational Logic Circuits [p. 370]
-
Chen Wang, Irith Pomeranz, Sudhakar M. Reddy
In this work, a new and effective procedure, called REDI, to
efficiently identify redundant single stuck-at faults in
combinational logic circuits is proposed. The method is fault
oriented and uses sensitizability of partial paths to determine
redundant faults. It uses only implications and hence may not
determine all the redundant faults of a circuit. However,
experimental results presented on benchmark circuits show
that the procedure identifies nearly all the redundant faults in
most of the benchmark circuits. The key features of REDI that
make it efficient are: partial path sensitization, blockage
learning, dynamic branch ordering and fault grouping.
Experimental results on benchmark circuits demonstrate the
efficiency of the proposed procedure in identifying redundant
faults in combinational logic circuits.
-
7A.3 Crosstalk Fault Detection by Dynamic Idd [p. 375]
-
Xiaoyun Sun, Seonki Kim, Bapiraju Vinnakota
Undesired capacitive crosstalk between signals is expected to be a significant
concern in deep submicron circuits. New test techniques are needed for these
crosstalk faults since they may cause unacceptable performance degradation.
We analyze the impact of crosstalk faults on a circuit's power dissipation.
Crosstalk faults can be detected by monitoring the dynamic supply current.
The test method is based on a recently developed dynamic Idd test metric, the
energy consumption ratio (ECR). ECR-based test has been shown to be effective
at tolerating the impact of process variations. In this paper, we apply a
ECR-based test method called ECR-VDD test to detect the crosstalk faults. The
effectiveness of the method is demonstrated by simulation results.
Moderators: Miodrag Potkonjak, University of California, Los Angeles, CA
Kazutoshi Wakabayashi, NEC Corp., Kawasaki, Japan
-
7B.1 Color Permutation: An Iterative Algorithm for Memory Packing [p. 380]
-
Jianwen Zhu, Edward S. Rogers, Sr.
It is predicted that 70%of the silicon real-estate will be occupied
by memories in future system-on-chips. The minimization
of on-chip memory hence becomes increasingly important
for cost, performance and energy consumption. In this
paper, we present a reasonably fast algorithm based on iterative
improvement, which packs a large number of memory
blocks into a minimum-size address space. The efficiency of
the algorithm is achieved by two new techniques. First, in
order to evaluate each solution in linear time, we propose a
new algorithm based on the acyclic orientation of the memory
conflict graph. Second, we propose a novel representation
of the solution which effectively compresses the potentially
infinite solution space to a finite value of n!, where n
is the number of vertices in th memory conflict graph. Furthermore,
if a near-optimal solution is satisfactory, this value
can be dramatically reduced to c!, where c! is the chromatic
number of the memory conflict graph. Experiments show
that consistent improvement over scalar method by 30% can
be achieved.
-
7B.2 Constraint Satisfaction for Relative Location Assignment and Scheduling [p. 384]
-
Carlos Alba-Pinto, Bart Mesman, Jochen Jess
Tight data- and timing constraints are imposed by communication
and multimedia applications. The architecture
for the embedded processor imply resource constraints. Instead
of random-access registers, relative location storages
or rotating register files are used to exploit the available
parallelism of resources by means of reducing the initiation
interval in pipelined schedules. Therefore, the compiler or
synthesis tool must deal with the difficult tasks of scheduling
of operations and location assignment of values while respecting
all the constraints including the storage file capacity.
This paper presents a method that handles constraints of
relative location storages during scheduling together with
timing and resource constraints. The characteristics of the
coloring of conflict graphs, representing the relative overlap
of value instances, are analyzed in order to identify the
bottlenecks for location assignment with the aim of serializing
their lifetimes. This is done with pairs of loop instances
of values until it can be guaranteed that all constraints will
be satisfied. Experiments show that high quality schedules
for kernels and inner loops can be efficiently obtained.
-
7B.3 A Super-Scheduler for Embedded Reconfigurable Systems [p. 391]
-
S. Ogrenci Memik, E. Bozorgzadeh, R. Kastner, M. Sarrafzadeh
Emerging reconfigurable systems attain high performance
with embedded optimized cores. For mapping designs on such special
architectures, synthesis tools, that are aware of the special capabilities of
the underlying architecture are necessary. In this paper we are proposing
an algorithm to perform simultaneous scheduling and binding, targeting
embedded reconfigurable systems. Our algorithm differs from
traditional scheduling methods in its capability of efficiently utilizing
embedded blocks within the reconfigurable system. Our algorithm can
be used to implement several other scheduling techniques, such as ASAP,
ALAP, and list scheduling. Hence we refer to it as a super-scheduler. Our
algorithm is a path-based scheduling algorithm. At each step, an individual
path from the input DFG is scheduled. Our experiments with several
DFGās extracted from MediaBench suit indicate promising results.
Our scheduler presents capability to perform the trade-off between maximally
utilizing the high-performance embedded blocks and exploiting
parallelism in the schedule.
Moderators: Jo Dale Carothers, University of Arizona, Tucson, AZ
Amir H. Farrahi, IBM Corp. TJ Watson Research Center, Yorktown Heights, NY
-
7C.1 Multilevel Approach to Full-Chip Gridless Routing [p. 396]
-
Jason Cong, Jie Fang, Yan Zhang
This paper presents a novel gridless detailed routing approach based on
multilevel optimization. The multilevel framework with recursive coarsening
and refinement in a "V-shaped" flow allows efficient scaling of our gridless
detailed router to very large designs. The downward pass of recursive coasening
builds the representations of routing regions at different levels, while the
upward pass of iterative refinement allows a gradual convergence to a globally
optimized solution. The use of a multicommodity flow-based routing algorithm
for the initial routing at the coarsest level and a modified maze algorithm
for the refinement at each level considerably improves the quality of
gridless routing results. Compared with the recently published gridless
detailed routing algorithm using wire planning [1], our multilevel gridless
routing algorithm is 3x to 75x faster. We also compared our multilevel
framework with a recently developed three-level routing approach [1] and
a traditional hierarchical routing approach. Our multilevel algorithm
generates better detailed routing results with higher completion rates. To
our knowledge, this is the first time that multilevel optimization has been
applied to IC routing.
-
7C.2 A Force-Directed Maze Router [p. 404]
-
Fan Mo, Abdallah Tabbara, Robert K. Brayton
A new routing algorithm is presented. It is based on a multiple
star net model, force-directed placement and maze searching techniques.
The algorithm inherits the power of maze routing in that it is able to route
complex layouts with various obstructions. The large memory requirement
of the conventional maze algorithm is alleviated through successive net
refinement, which constrains the maze searching to small regions. The
algorithm shows advantages in routing designs with complicated layout
obstructions.
-
7C.3 Minimum-Buffered Routing of Non-Critical Nets for Slew Rate
and Reliability Control [p. 408]
-
Charles Alpert, Andrew B. Kahng, Bao Liu, Ion Mandoiu, Alexander Zelikovsky
In high-speed digital VLSI design, bounding the load capacitance at
gate outputs is a well-known methodology to improve coupling noise
immunity, reduce degradation of signal transition edges, and reduce delay
uncertainty due to coupling noise. Bounding load capacitance also
improves reliability with respect to hot-carrier oxide breakdown and
AC self-heating in interconnects, and guarantees bounded input rise/fall
times at buffers and sinks.
This paper introduces a new minimum-buffer routing problem
(MBRP) formulation which requires that the capacitive load of each
buffer, and of the source driver, be upper-bounded by a given constant.
Our contributions include the following.
-We give linear-time algorithms for optimal buffering of a given
routing tree with a single (inverting or non-inverting) buffer type.
- For simultaneous routing and buffering with a single non-inverting
buffer type, we give a factor 2(1+e) approximation algorithm and
prove that no algorithm can guarantee a factor smaller than 2 unless
P=NP. For the case of a single inverting buffer type, we give a factor
4(1+e) approximation algorithm.
- We give local-improvement and clustering based MBRP heuristics
with improved practical performance, and present a comprehensive
experimental study comparing the runtime/quality tradeoffs of
the proposed MBRP heuristics on test cases extracted from recent
industrial designs.
Moderators: Mattan Kamon, Coventor, Inc., Cambridge, MA
Kenneth L. Shepard, Columbia University, New York, NY
-
7D.1 Highly Accurate Fast Methods for Extraction and Sparsification
of Substrate Coupling Based on Low-Rank Approximation
[p. 417]
-
Joe Kanapka, Jacob White
More aggressive design practices have created renewed interest in
techniques for analyzing substrate coupling problems. Most previous
work has focused primarily on faster techniques for extracting
coupling resistances, but has offered little help for reducing the resulting
resistance matrix, whose number of nonzero entries grows
quadratically with the number of contacts. Wavelet-like methods
have been applied to sparsifying the resistance matrix representing
the substrate coupling, but the accuracy of the method is very
sensitive to the particulars of the contact layout. In this paper we
show that for the substrate problem it is possible to improve considerably
on the wavelet-like methods by making use of the algorithmic
structure common to the fast multipole and wavelet-like
algorithms, but making judicious use of low-rank approximations.
The approach, motivated by the hierarchical SVD algorithm, can
achieve more than an order of magnitude better accuracy for commensurate
sparsity, or can achieve much better sparsity at commensurate
accuracy, when compared to the wavelet-like algorithm.
-
7D.2 Fast 3-D Inductance Extraction in Lossy Multi-Layer Substrate [p. 424]
-
Minqing Liu, Tiejun Yu, Wayne W.-M. Dai
A mixed potential integral equation (MPIE) technique combined with
fast multi-layer Greenās functions and Gaussian Jacobi high order techniques
is used to compute the 3-D frequency dependent inductances and
resistances in lossy multi-layered substrate. Compared to FastHenry,
a multipole-accelerated 3-D inductance extraction program, the algorithm
presented here is more accurate and faster for lossy multilayer
structures due to two reasons. First, substrate and optional ground
planeās lose and coupling effect are efficiently modeled by multilayer
Greenās functions, while the Greenās functions are efficiently calculated
via window-based acceleration technique. Second, Gaussian Jacobi-based
high order techniques are used to capture the singularity of the
current distribution at metal edges, leading to significant reduction of
problem size and speed-up of computation.
-
7D.3 Simulation Approaches for Strongly Coupled Interconnect Systems [p. 430]
-
Joel R. Phillips, L. Miguel Silveira
Shrinking feature sizes and increasing speeds of operation make
interconnect-related effects very relevant for current circuit verification
methodologies. Reliable and accurate system verification
requires the full analysis of circuits together with the environment
that surrounds them, including the common substrate, the packaging
structures, and perhaps even board information. In this paper
we discuss circuit-level simulation algorithms that enable the analysis
of the impact of strongly coupled interconnect structures on
nonlinear circuit operation, so as to allow reliable and accurate
system verification.
Moderators: Olivier Coudert, Monterey Design Systems, Inc., Sunnyvale, CA
Tiziano Villa, Parades Labs., Rome, Italy
-
8A.1 BOOM ö A Heuristic Boolean Minimizer [p. 439]
-
Jan Hlavicka, Petr Fiser
We present a two-level Boolean minimization tool (BOOM)
based on a new implicant generation paradigm. In contrast to all
previous minimization methods, where the implicants are
generated bottom-up, the proposed method uses a top-down
approach. Thus instead of increasing the dimensionality of
implicants by omitting literals from their terms, the dimension of
a term is gradually decreased by adding new literals. Unlike
most other minimization tools like ESPRESSO, BOOM doesn't
use the definition of the function to be minimized as a basis for
the solution, thus the original coverage influences the solution
only indirectly through the number of literals used.
Most minimization methods use two basic phases introduced
by Quine-McCluskey, known as prime implicant (PI) generation
and the covering problem solution. Some more modern methods,
like ESPRESSO, combine these two phases, reducing the number
of PIs to be processed. This approach is also used in BOOM, the
search for new literals to be included into a term aims at
maximum coverage of the output function.
The function to be minimized is defined by its on-set and off-set,
listed in a truth table. Thus the don't care set, often
representing the dominant part of the truth table, need not be
specified explicitly. The proposed minimization method is
efficient above all for functions with a large number of input
variables while only few care terms are defined.
The minimization procedure is very fast, hence if the first
solution does not meet the requirements, it can be improved in an
iterative manner. The method has been tested on several different
kinds of problems, like the MCNC standard benchmarks or
larger problems generated randomly.
-
8A.2 Faster SAT and Smaller BDDs via Common Function Structure [p. 443]
-
Fadi A. Aloul, Igor L. Markov, Karem A. Sakallah
The increasing popularity of SAT and BDD techniques in verification
and synthesis encourages the search for additional
speed-ups. Since typical SAT and BDD algorithms are exponential
in the worst-case, the structure of real-world instances is a
natural source of improvements. While SAT and BDD techniques
are often presented as mutually exclusive alternatives, our work
points out that both can be improved via the use of the same
structural properties of instances. Our proposed methods are
based on efficient problem partitioning and can be easily applied
as pre-processing with arbitrary SAT solvers and BDD packages
without source code modifications.
Finding a better variable-ordering is a well recognized problem
for both SAT solvers and BDD packages. Currently, all leading
edge variable-ordering algorithms are dynamic, in the sense
that they are invoked many times in the course of the "host"
algorithm that solves SAT or manipulates BDDs. Examples
include the DLCS ordering for SAT solvers and variable-sifting
during BDD manipulations. In this work we propose a universal
variable-ordering MINCE (MIN Cut Etc.) that preprocesses a
given Boolean formula in CNF. MINCE is completely independent
from target algorithms and outperforms both DLCS for SAT
and variable sifting for BDDs. We argue that MINCE tends to
capture structural properties of Boolean functions arising from
real-world applications. Our contribution is validated on the
ISCAS circuits and the DIMACS benchmarks. Empirically, our
technique often outperforms existing techniques by a factor of
two or more. Our results motivate search for stronger dynamic
ordering heuristics and combined static/dynamic techniques.
-
8A.3 Recursive Bipartitioning of BDDs for Performance Driven Synthesis of Pass
Transistor Logic Circuits
[p. 449]
-
Rupesh S. Shelar, Sachin S. Sapatnekar
In this paper, we address the problem of performance oriented synthesis
of pass transistor logic (PTL) circuits using a binary decision diagram (BDD)
decomposition technique. We transform the BDD decomposition problem into
a recursive bipartitioning problem and solve the latter using a max-flow
min-cut technique. We use the are and delay cost of the PTL implementation
of the logic function to guide the bipartitioning scheme. Using recursive
bipartitioning and a one-hot multiplexer circuit, we show that our PTL
implementation has logarithmic delay in the number of inputs, under certain
assumptions. The experimental results on benchmark circuits are promising,
since they show the significant delay reductions with small or no area
overheads as compared to previous approaches.
-
8A.4 A Probabilistic Constructive Approach to Optimization Problems [p. 453]
-
Jennifer L. Wong, Farinzaz Koushanfar, Seapahn Meguerdichian,
Miodrag Potkonjak
We propose a new optimization paradigm for solving intractable
combinatorial problems. The technique, named Probabilistic Constructive
(PC), combines the advantages of both constructive and probabilistic
algorithms. The constructive aspect provides relatively short runtime and
makes the technique amenable for the inclusion of insights through heuristic
rules. The probabilistic nature facilitates a flexible trade-off between
runtime and the quality of solution.
In addition to presenting the generic technique, we apply it to the
Maximal Independent Set problem. Extensive experimentation indicates
that the new approach provides very attractive trade-offs between the quality
of the solution and runtime, often outperforming the best previously
published approaches.
Moderators: Pai Chou, University of California, Irvine, CA
Luciano Lavagno, Cadence Berkeley Labs., Berkeley, CA
-
8B.1 Energy Efficient Real-Time Scheduling [p. 458]
-
Amit Sinha, Anantha P. Chandrakasan
Real-time scheduling on processors that support
dynamic voltage and frequency scaling is analyzed. The Slacked
Earliest Deadling First (SEDF) algorithm is proposed and it is shown
that the algorithm is optimal in minimizing processor energy consumption
and maximum lateness. An upper bound on the processor
energy savings is also derived. Real-time scheduling of periodic tasks
is also analyzed and optimal voltage and frequency allocation for a
given task set is determined that guarantees schedulability and minimizes
energy consumption.
-
8B.2 Efficient Performance Estimation for General Real-Time Task Systems [p. 464]
-
Hongchao (Stephanie) Liu, Xiaobo (Sharon) Hu
The paper presents a novel approach to compute tight upper bounds on
the processor utilization independent of the implementation for general
real-time systems where tasks are composed of subtasks and precedence
constraints may exist among subtasks of the same task. We formulate
the problem as a set of linear programming (LP) problems. Observations
are made to reduce the number of LP problem instances required to be
solved, which greatly improves the computation time of the utilizatoin
bounds. Furthermore, additional constraints are allowed to be included
Under certain circumstances to improve the quality of the bounds.
-
8B.3 Stars in VCC: Complementing Simulation with Worst-Case Analysis [p. 471]
-
Felice Balarin
STARS is a methodology for worst-case analysis of embedded systems.
STARS manipulates abstract representations of system components
to obtain upper bounds on the number of various events
in the system, as well as a bound on the response time. VCC is
a commercial discrete event simulator, that can be used both for
functional and performance verification. We describe an extension
of VCC to facilitate STARS. The extension allows the user to specify
abstract representations of VCC modules. These abstractions
are used by STARS, but their validity can also be checked by VCC
simulation. We also propose a mostly automatic procedure to generate
these abstractions. Finally, we illustrate on an example how
STARS can be combined with simulation to find bugs that would
be hard to find by simulation alone.
Moderators: Anirudh Devgan, IBM Corp., Austin, TX
Carlo Guardini, PDF Solutions, San Jose, CA
-
8C.1 Multigrid-Like Technique for Power Grid Analysis [p. 480]
-
Joseph N. Kozhaya, Sani R. Nassif, Farid N. Najm
Modern sub-micron VLSI designs include huge power grids
that are required to distribute large amounts of current, at increasingly
lower voltages. The resulting voltage drop on the
grid reduces noise margin and increases gate delay, resulting
in a serious performance impact. Checking the integrity of the
supply voltage using traditional circuit simulation is not practical,
for reasons of time and memory complexity. We propose a
novel multigrid-like technique for the analysis of power grids.
The grid is reduced to a coarser structure, and the solution is
mapped back to the original grid. Experimental results show
that the proposed method is very efficient as well as suitable for
both DC and transient analysis of power grids.
-
8C.2 An Analytical High-Level Battery Model for Use in Energy Management
of Portable Electronic Systems
[p. 488]
-
Daler N. Rakhmatov, Sarma B.K. Vrudhula
Once the battery becomes fully discharged, a battery-powered
portable electronic system goes off-line. Therefore, it is important
to take the battery behavior into account. A system designer
needs an adequate high-level model in order to make battery-aware
decisions that target maximization of the systemās lifetime on-line.
We propose such a model: it allows a designer to predict the battery
time-to-failure for a given load and provides a cost metric for life-time
optimization algorithms. Our model also allows for a tradeoff
between the accuracy and the amount of computation performed.
The quality of the proposed model is evaluated using a detailed
low-level simulation of a lithiumion electrochemical cell.
-
8C.3 Power-Delay Modeling of Dynamic CMOS Gates for Circuit Optimization [p. 494]
-
J.L. Rossello, Jaume Segura
We present an accurate analytical expression to compute power and delay
of domino CMOS circuits from a detailed description of internal capacitor
switching and discharging currents. The expression obtained accounts for
the main effects in complex sub-micron gates like velocity saturation effects,
body effect, device sizes and coupling capacitors. The energy-delay product
is also evaluated and analyzed. Results are compared to HSPICE simulations
(level 50) for a 0.18 um CMOS technology.
Moderators: Tim Burks, Magma Design Automation, Inc., Cupertino, CA
Florentin Dartu, Intel Corp., Hillsboro, OR
-
8D.1 A Symbolic Simulation-Based Methodology for Generating Black-Box
Timing Models of Custom Macrocells
[p. 501]
-
Clayton McDonald, Randal E. Bryant
We present a methodology for generating black-box timing models
for full-custom transistor-level CMOS circuits. Our approach
utilizes transistor-level ternary symbolic timing simulation to explore
the input arrival time space and determine the input arrival
time windows that result in proper operation. This approach integrates
symbolic timing simulation into existing static timing analysis
flows and allows automated modelling of the timing behavior
of aggressive full-custom circuit design styles.
-
8D.2 On the Signal Bounding Problem in Timing Analysis [p. 507]
-
Jin-Fuw Lee, D.L. Ostapko, Jeffery Soreff, C.K. Wong
In this paper, we study the propagation of slew dependent
bounding signals and the corresponding slew problem in static
timing analysis. The selection of slew from the latest arriving
signal, a commonly used strategy, may violate the rule of
monotonic delay. Several methods for generating bounding
signals to overcome this difficulty are described. The accuracy
and monotonicity of each method is analyzed. These methods
can be easily implemented in a static timer to improve the
accuracy.
-
8D.3 False-Noise Analysis using Logic Implications [p. 515]
-
Alexey Glebov, Sergey Gavrilov, David Blaauw, Supamas Sirichotiyakul,
Chanhee Oh, Vladimir Zolotov
Cross-coupled noise analysis has become a critical concern in
todayās VLSI designs. Typically, noise analysis makes an assumption
that all aggressing nets can simultaneously switch in the same
direction. This creates a worst-case noise pulse on the victim net
that often leads to false noise violations. In this paper, we present a
new approach that uses logic implications to identify the maximum
set of aggressor nets that can inject noise simultaneously under the
logic constraints of the circuit. We propose an approach to efficiently
generate logic implications from a transistor-level description
and propagate them in the circuit using ROBDD
representations and a newly proposed laterial propagation method.
We then show that the problem of finding the worst case logically
feasible noise can be represented as a maximum weighted independent
set problem and show how to efficiently solve it. Initially,
we restrict our discussion to zero-delay implications, which are
valid for glitch-free circuits and then extend our approach to timed
implications. The proposed approaches were implemented in an
industrial noise analysis tool and results are shown for a number of
industrial test cases. We demonstrate that a significant reduction in
the number of noise failures can be obtained from considering the
logic implications as proposed in this paper, underscoring the need
for false-noise analysis.
Moderators: Bapi Vinnakota, University of Minnesota, Minneapolis, MN
Seiji Kajihara, Kyushu Institute of Technology, Iizuka, Japan
-
9A.1 The Design and Optimization of SOC Test Solutions [p. 523]
-
Erik Larsson, Zebo Peng, Gunnar Carlsson
We propose an integrated technique for extensive
optimization of the final test solution for System-on-Chip
using Simulated Annealing. The produced results from the
technique are a minimized test schedule fulfilling test
conflicts under test power constraints and an optimized
design of the test access mechanism. We have implemented
the proposed algorithm and performed experiments with
several benchmarks and industrial designs to show the
usefulness and efficiency of our technique.
-
9A.2 Accurate CMOS Bridge Fault Modeling with Neural Network-Based
VHDL Saboteurs [p. 531]
-
Don Shaw, Dhamin Al-Khalili, Cme Rozon
This paper presents a new bridge fault model that is based on
a multiple layer feedforward neural network and implemented
within the framework of a VHDL saboteur cell. Empirical evidence
and experimental results show that it satisfies a prescribed
set of bridge fault model criteria better than existing approaches.
The new model computes exact bridged node voltages and propagation
delay times with due attention to surrounding circuit elements.
This is significant since, with the exception of full analog
simulation, no other technique attempts to model the delay effects
of bridge defects. Yet, compared to these analog simulations, the
new approach is orders of magnitude faster and achieves reasonable
accuracy; computing bridged node voltages with an average
error near 0.006 volts and propagation delay times with an average
error near 14 ps.
Keywords:
Bridge Defects, Fault Models, Neural Networks, VHDL, CMOS
ICs, Fault Simulation
-
9A.3 Algorithm Level Re-Computing - A Register Transfer Level
Concurrent Error Detection Technique [p. 537]
-
Kaijie Wu, Ramesh Karri
In this paper we propose two algorithm-level time redundancy based
Concurrent Error Detection (CED) schemes that exploit diversity in
a Register Transfer (RT) level implementation. RT level diversity
can be achieved either by changing the operation-to-operator allocation
(allocation diversity) or by shifting the operands before re-computation
(data diversity). By enabling a fault to affect the normal result and the
re-computed result in two different ways, RT level diversity yields good
CED capability with low area overhead. We used Synopsys Behavior Complier
(BC) to implement the technique.
Moderators: Sridevan Parameswaran, The University of New
South Wales, Kensington, Australia
Nikil Dutt, University of California, Irvine, CA
-
9B.1 Transient Power Management Through High Level Synthesis [p. 545]
-
Vijay Raghunathan, Srivaths Ravi, Anand Raghunatha, Ganesh Lakshminarayana
The use of nanometer technologies is making it increasingly important
to consider transient characteristics of a circuitās power dissipation (e.g.,
peak power, and power gradient or differential) in addition to its average
power consumption. Current transient power analysis and reduction approaches
are mostly at the transistor- and logic-levels. We argue that, as
was the case with average power minimization, architectural solutions
to transient power problems can complement and significantly extend
the scope of lower-level techniques.
In this work, we present a high-level synthesis approach to transient
power management. We demonstrate how high-level synthesis can impact
the cycle-by-cycle peak power and peak power differential for the
synthesized implementation. Further, we demonstrate that it is necessary
to consider transient power metrics judiciously in order to minimize
or avoid area and performance overheads. In order to alleviate the
limits on parallelism imposed by peak power constraints, we propose a
novel technique based on the selective insertion of data monitor operations
in the behavioral description. We present enhanced scheduling
algorithms that can accept constraints on transient power characteristics
(in addition to the conventional resource and performance constraints).
Experimental results on several example designs obtained using a
state-of-the-art commercial design flow and technology library indicate that
high-level synthesis with transient power management results in significant
benefits ÷peak power reductions of up to 32% (average of 25%),
and peak power differential reductions of up to 58% (average of 42%)
÷with minimal performance overheads.
-
9B.2 An Integrated Data Path Optimization for Low Power Based
on Network Flow Method
[p. 553]
-
Chun-Gi Lyuh, Taewhan Kim, C.L. Liu
We propose an effective algorithm for power optimization
in behavioral synthesis. In previous work, it has been shown
that several hardware allocation/binding problems for power optimization
can be formulated as network flow problems and be solved
optimally. However, in these formulations, a fixed schedule was
assumed. In such context, one key problem is: given an optimal
network flow solution to a hardware allocation/binding problem for
a schedule, how to generate a new optimal network flow solution
rapidly for a local change of the schedule. To this end, from a comprehensive
analysis of the relation between network structure and
flow computation, we devise a two-step procedure: (Step 1) max-flow
computation step which finds a valid (maximum) flow solution
while retaining the previous (maximum flow of minimum cost)
solution as much as possible; (Step 2) min-cost computation step
which incrementally refines the flow solution obtained in Step 1,
using the concept of finding a negative cost cycle in the residual
graph for the flow. The proposed algorithm can be applied effectively
to several important high-level data path optimization problems
(e.g., allocations/bindings of functional units, registers, buses,
and memory ports) when we have the freedom to choose a schedule
that will minimize power consumption. Experimental results
(for bus synthesis) on benchmark problems show that our designs
are 5.2% more power-efficient over the best known results, which
is due to (a) exploitation of the effect of scheduling and (b) optimal
binding for every schedule instance. Furthermore, our algorithm is
about 2.8 times faster in run time over the full network flow based
(optimal) bus synthesis algorithm, which is due to (c) our novel
(two-step) mechanism which utilize the previous flow solution to reduce
redundant flow computations.
-
9B.3 What is the Limit of Energy Saving by Dynamic Voltage Scaling? [p. 560]
-
Gang Qu
Dynamic voltage scaling (DVS) is a technique that varies the supply
voltage and clock frequency based on the computation load to
provide desired performance with the minimal amount of energy
consumption. It has been demonstrated as one of the most effective
low power system design techniques, in particular for real time
systems. Previously, there are works on both ends of the DVS systems:
the ideal variable voltage system which can change its voltage
with no physical constraints, and the multiple voltage system
which has a number of discrete voltages available simultaneously.
In this paper, we study the DVS systems between these two extreme
cases. We consider systems that can vary the operating voltage dynamically
under various real-life physical constraints. Based on
the systemās different behavior during voltage transition, we define
the feasible DVS system and the practical DVS system. We build
mathematical model to analyze the potential of DVS on energy saving
for these different systems. Finally, we simulate the behavior
of a secure wireless communication networks with DVS systems.
The results show that DVS results in energy reduction from 36% to
79%, and the real life DVS systems can be very close to the ideal
system in energy saving.
Moderators: Jason Cong, University of California, Los Angeles, CA
Patrick Groeneveld, Magma Design Automation, Inc., Cupertino, CA
-
9C.1 Local Search for Final Placement in VLSI Design [p. 565]
-
Oluf Faroe, David Pisinger, Martin Zachariasen
A new heuristic is presented for the general cell placement
problem where the objective is to minimize total bounding box
netlength. The heuristic is based on the Guided Local Search
(GLS) metaheuristic. GLS modifies the objective function in a
constructive way to escape local minima. Previous attempts to
use local search on final (or detailed) placement problems have
often failed as the neighborhood quickly becomes too excessive
for large circuits. Nevertheless, by combining GLS with
Fast Local Search it is possible to focus the search on appropriate
sub-neighborhoods, thus reducing the time complexity
considerably.
Comprehensive computational experiments with the developed
algorithm are reported on small, medium and large industrial
circuits, and for standard cell and general cell variants
of the problem. The experiments demonstrate that the developed
algorithm is able to improve the estimated routing length
of large-sized general cell layouts with as much as 20 percent.
The general nature of the proposed method makes it easy
to incorporate other goals, such as routability and timing constraints,
into the objective function. Current layout algorithms
use a feedback approach in which a placement is evaluated
by performing (partial) routing and timing analysis; the output
of this analysis is then used to construct an improved placement.
This iterative nature of the design process calls for
placement algorithms that take an existing placement and construct
an improved placement that resembles the original one,
but in which the information from the routing/timing analysis
is taken into account.
-
9C.2 Congestion Reduction During Placement Based on Integer Programming [p. 573]
-
Xiaojian Yang, Ryan Kastner, Majid Sarrafzadeh
This paper presents a novel method to reduce routing congestion
during placement stage. The proposed approach is used as a post-processing
step in placement. Congestion reduction is based on local
improvement on the existing layout. However, the approach has
a global view of the congestion over the entire design. It uses integer
linear programming (ILP) to formulate the conflicts between
multiple congested regions, and performs local improvement according
to the solution of ILP. Experiments show that the proposed
approach can effectively reduce the total overflow of global routing
result. The short running time of the algorithm indicates good
scalability on large designs.
-
9C.3 Direct Transistor-Level Layout for Digital Blocks [p. 577]
-
Prakash Gopalakrishnan, Rob A. Rutenbar
We present a complete transistor-level layout
flow, from logic netlist to final shapes, for blocks of combinational
logic up to a few thousand transistors in size. The direct
transistor-level attack easily accommodates the demands for
careful custom sizing necessary in high-speed design, and is also
significantly denser than a comparable cell-based layout. The
key algorithmic innovations are (a) early identification of essential
diffusion-merged MOS device groups called clusters, but (b)
deferred binding of clusters to a specific shape-level layout until
the very end of a multi-phase placement strategy. A global
placer arranges uncommitted clusters; a detailed placer optimizes
clusters at shape level for density and for overall routability.
A commercial router completes the flow. Experiments
comparing to a commercial standard cell-level layout flow show
that, when flattened to transistors, our tool consistently achieves
100% routed layouts that average 23% less area.
Moderators: Dennis M. Sylvester, University of Michigan, Ann Arbor, MI
Mustafa Celik, Monterey Design Systems, Inc., Sunnyvale, CA
-
9D.1 Model Reduction of Variable-Geometry Interconnects using
Variational Spectrally-Weighted Balanced Truncation [p. 586]
-
Payam Heydari, Massoud Pedram
This paper presents a spectrally-weighted balanced truncation
technique for RLC interconnects, a technique needed when the interconnect
circuit parameters change as a result of variations in the manufacturing
process. The salient features of this algorithm are the inclusion of
parameter variations in the RLC interconnect, the guaranteed stability of
the reduced transfer function, and the availability of provable
frequency-weighted
error bounds for the reduced-order system. This paper shows
that the balanced truncation technique is an effective model-order reduction
technique when variations in the circuit parameters are taken into
consideration. Experimental results show that the new variational spectrally-weighted
balanced truncation attains, on average, 20% more accuracy
than the variational Krylov-subspace-based model-order reduction
techniques while the run-time is also, on average, 5% faster.
-
9D.2 Improving the Robustness of a Surface Integral Formulation for
Wideband Impendance Extraction of 3D Structures
[p. 592]
-
Zhenhai Zhu, Jingfang Huang, Ben Song, Jacob White
In order for parasitic extraction of high-speed integrated circuit interconnect
to be sufficiently efficient, and fit with model-order reduction techniques,
a robust wideband surface integral formulation is essential. One
recently developed surface integral formulation has shown promise, but
was plagued with numerical difficulties of poorly understood origin. In
this paper we show that one of that formulationās difficulties was related
to the inaccuracy in the approach to evaluate integrals over discretization
panels, and we present an accurate approach based on an adapted
piecewise quadrature scheme. We also show that the condition number
of the original system of integral equations can be reduced by differentiating
one of the integral equations. Computational results on a ring and
a spiral inductor are used to show that the new quadrature scheme and
the differentiated integral formulation improve accuracy and accelerate
the convergence of iterative solution methods.
-
9D.3 Practical Considerations in RLCK Crosstalk Analysis for Digital
Integrated Circuits
[p. 598]
-
Steven C. Chan, K.L. Shepard
Inductance and inductive crosstalk has become an important
new concern for on-chip wires in deep-submicron integrated circuits.
Recent advances in extractors to include inductance make
possible the extraction of coupled RLCK interconnect networks
from large, complex on-chip layouts. In this paper, we describe
the techniques we use in a commercial static noise analysis tool
to analyze crosstalk noise due to fully-coupled RLCK networks
extracted from layout. Notable are the approaches we use to filter
and lump aggressor couplings, as well as the techniques used
to handle degeneracies in the modified nodel analysis (MNA) formulation.
Furthermore, the nonmonotonicity of interconnect responses
in the presence of inductance require additional "sensitizations"
in searching the possible switching events inducing the
worst-case noise. Comparisons with silicon indicate the need to
include the substrate in the extracted models in certain cases.
Moderators: Yuji Kukimoto, Silicon Perspective Corp., Santa Clara, CA
Prabhakar Kudva, IBM Corp. TJ Watson Research
Center, Yorktown Heights, NY
-
10A.1 Single-Pass Redundancy Addition and Removal[p. 606]
-
Chih-Wei (Jim) Chang, Malgorzata Marek-Sadowska
Redundancy-addition-and-removal is a rewiring technique which for a
given target wire wt finds a redundant alternative wire
wa. Addition of wa makes wt redundant
and hence removable without changing the overall
circuit functionality. Incremental logic restructuring based on this
technique has been used in many applications. However, the search for
valid alternative wires requires trial-and-error redundancy testing of a
potentially large set of candidate wires. In this paper, we study the
fundamental
theory behind this technique and propose a new reasoning
scheme which directly identifies alternative wires without performing
trial-and-error tests. Experimental results show up to 15 times speedup
in comparison to the best techniques in literature.
-
10A.2 Efficient Canonical Form for Boolean Matching of Complex Functions in
Large Libraries [p. 610]
-
Jovanka Ciric, Carl Sechen
A new algorithm is developed which transforms the truth
table or implicant table of a Boolean function into a canonical
form under any permutation of inputs. The algorithm is used for
Boolean matching for large libraries that contain cells with large
numbers of inputs and implicants. The minimum cost canonical
form is used as a unique identifier for searching for the cell in
the library. The search time is nearly constant if a hash table is
used for storing the cellsā canonical representations in the
library. Experimental results on more than 100,000 gates
confirm the validity and feasible run-time of the algorithm.
-
10A.3 Compatible Observability Donāt Cares Revisited [p. 618]
-
R.K. Brayton
CODCs stand for compatible observability don't cares. We first examine the
definition of compatibility and when a set of CODCs is compatible. We then
discuss Savoj's CODC computation for propagating CODCs from a node's output to
its fanins, and show by example, that the results can depend on the current
implementation of the node. Then we generalize the computation so that the
result is independent of the implementation at the node. The CODCs propagated
by this computation are proved to be maximal in some sense. Local don't cares
(LDCs) are CODCs of a node, pre-imaged to the primary inputs and then imaged
and projected to the local fanins of the node. LDCs combine CODCs with SDCs
(satisfiabaility don't cares), but only the CODC part is propagated to the fanin
network. Another form of local don't cares, propagates both the CODC and SDC
parts to the fanin network. Both are shown to be compatible in some sense,
but conservative. We give a method for updating both kinds of local don't
cares incrementally when other nodes in the network are changed.
Moderators: Preeti Panda, Synopsys, Inc., Mountain View, CA
Tony Givargis, University of California, Irvine, CA
-
10B.1 A Methodology for the Design of Application Specific Instruction
Set Processors (ASIP) using the Machine Description Language LISA
[p. 625]
-
Andreas Hoffmann, Oliver Schliebusch, Achim Nohl, Gunner Braun,
Oliver Wahlen, Heinrich Meyr
The development of application specific instruction set processors
(ASIP) is currently the exclusive domain of the semiconductor
houses and core vendors. This is due to the fact
that building such an architecture is a di±cult task that requires
expertise knowledge in different domains: application
software development tools, processor hardware implementation,
and system integration and verification. This paper
presents a retargetable framework for ASIP design which is
based on machine descriptions in the LISA language. From
that, software development tools can be automatically generated
including HLL C-compiler, assembler, linker, simulator and debugger
frontend. Moreover, synthesizable HDL
code can be derived which can then be processed by standard
synthesis tools. Implementation results for a low-power
ASIP for DVB-T acquisition and tracking algorithms designed
with the presented methodology will be given.
-
10B.2 Area and Power Reduction of Embedded DSP Systems using
Instruction Compression and Re-Configurable Encoding
[p. 631]
-
Subash G. Chandar, Mahesh Mehendale, R. Govindarajan
In this paper, we propose a reconfiguration mechanism
that allows multiple instruction compression to reduce
both code size, which in turn reduces the cost, and (instruction
fetch) power, which enhances the battery lifetime, two key considerations
in embedded DSP systems. We enhance Texas Instruments
DSP core TMS320C27x to incorporate this mechanism
and evaluate the improvements on code size and instruction fetch
energy using real life embedded control application programs.
We show that even with minimal hardware overhead, we can improve
code size by over 10% and instruction fetch energy by over
40%.
-
10B.3 I-CoPES: Fast Instruction Code Placement for Embedded Systems
to Improve Performance and Energy Efficiency
[p. 635]
-
Sri Parameswaran, Jrg Henkel
The ratio of cache hits to cache misses in a computer system is,
to a large extent, responsible for its characteristics such as energy
consumption and performance. In recent years energy efficiency
has become one of the dominating design constraints,
due to the rapid growth in market share for mobile computing/
communication/internet devices.
In this paper we present a novel fast constructive technique that relocates
the instruction code in such a manner into the main memory
that the cache is utilized more efficiently. The technique is applied
as a preöprocessing step, i.e. before the code is executed. It
is applicable for embedded systems where the number and characteristics
of tasks running on the system is know a priori. The technique
does not impose any computational overhead to the system.
As a result of applying our technique to a variety of real-world
applications we measured (through simulation) that the number of
cache misses drops significantly. Further, this reduces the energy
consumption of a whole system (CPU, caches, buses, main memory)
by up to 65% at an only slightly increased memory size of
13% on average.
Moderator: Farid Najm, University of Toronto, Toronto, ON, Canada
-
10C.1 IC Power Distribution Challenges [p. 643]
-
Sudhakar Bobba, Tyler Thorp, Kathirgamar Aingaran, Dean Liu
With each technology generation, delivering a time-varying
current with reduced nominal supply voltage
variation is becoming more difficult due to increasing
current and power requirements. The power delivery
network design becomes much more complex and requires
accurate analysis and optimizations at all levels of
abstraction in order to meet the specifications. In this paper,
we describe techniques for estimation of the supply voltage
variations that can be used in the design of the power
delivery network. We also describe the decoupling capacitor
hierarchy that provides a low impedance to the increasing
high-frequency current demand and limits the supply
voltage variations. Techniques for high-level power
estimation that can be used for performance vs. power
trade-offs to reduce the current and power requirements of
the circuit are also presented.
-
10C.2 Challenges in Power-Ground Integrity [p. 651]
-
Shen Lin, Norman Chang
With the advance of semiconductor manufacturing,
EDA, and VLSI design technologies, circuits with
increasingly higher speed are being integrated at an
increasingly higher density. This trend causes correspondingly
larger voltage fluctuations in the on-chip
power distribution network due to IR-drop, L di/dt noise,
or LC resonance. Therefore, Power-Ground integrity
becomes a serious challenge in designing future high-performance
circuits. In this paper, we will introduce Power-Ground
integrity, addressing its importance, verification
methodology, and problem solution.
Moderator: Rob A. Rutenbar, Carnegie Mellon University, Pittsburgh, PA
Panelists: Olivier Coudert, Patrick Groeneveld, Juergen Koehl,
Scott Peterson, Vivek Raghavan, Naresh Soni
-
11A.1 Automatic Hierarchical Design: Fantasy or Reality? [p. 656]
-
The debate regarding the design of integrated systems, flat versus
hierarchically, is almost as old as VLSI design itself. Through the years
experts have predicted that future generations of design automation tools
would have to adopt a strictly hierarchical scheme for mastering algorithmic
and logistic complexity. However, the steady improvement of tools and
algorithms and the desire to push integrated circuit performance to the
ultimate extremes has pushed the flat design style into the multi-million
gate domain. Will this development continue, or will hierarchy finally win ?
The panel will discuss the different views on this topic and explore possible
options for future design and tool flows. In particular, the panel will
address questions related to verification, system design, logic synthesis, and
physical design.
|