|
DAC'99 ABSTRACTS
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[Tutorials]
Chair: Joel R. Phillips
Organizers: Hidetoshi Onodera, Joel Phillips
-
1.1 An Efficient Lyapunov Equation-Based Approach for Generating Reduced-Order
Models of Interconnect [p. 1]
- Jing-Rebecca Li, Frank Wang, Jacob K. White
In this paper we present a new algorithm for computing
reduced-order models of interconnect which utilizes the
dominant controllable subspace of the system. The dominant
controllable modes are computed via a new iterative
Lyapunov equation solver, Vector ADI. This new algorithm
is as inexpensive as Krylov subspace-based moment
matching methods, and often produces a better approximation
over a wide frequency range. A spiral inductor
and a transmission line example show this new
method can be much more accurate than moment matching
via Arnoldi.
-
1.2 Error Bounded Padé Approximation via Bilinear Conformal
Transformation [p. 7]
- Chung-Ping Chen, D.F. Wong
Since Asymptotic Waveform Evaluation (AWE) was
introduced in [5], many interconnect model order reduction
methods via Pade approximation have been proposed.
Although the stability and precision of model
reduction methods have been greatly improved, the
following important question has not been answered:
"What is the error bound in the time domain?". This
problem is mainly caused by the "gap" between the frequency
domain and the time domain, i.e. a good approximated transfer
function in the frequency domain may not be a good
approximation in the time domain. All of the existing methods
approximate the transfer function directly in the frequency
domain and hence can not provide error bounds in the time domain. In
this paper, we present new moment matching methods
which can provide guaranteed error bounds in the time
domain. Our methods are based on the classic work by
Teasdale in [1] which performs Pade approximation in
a transformed domain by the bilinear conformal transformation
s = 1-z/1+z.
-
1.3 Model-Reduction of Nonlinear Circuits Using Krylov-Space Techniques [p. 13]
- Pavan K. Gunupudi, Michel S. Nakhla
A new algorithm based on Krylov subspace
methods is proposed for model-reduction of
large nonlinear circuits. Reduction is obtained
by projecting the original system described by
nonlinear differential equations into a subspace
of a lower dimension. The reduced model can
be simulated using conventional numerical integration
techniques. Significant reduction in
computational expense is achieved as the size of
the reduced equations is much less than that of
the original system.
1.1 Keywords:
Model-reduction, nonlinear circuits, Krylov-subspace
-
1.4 ENOR: Model Order Reduction of RLC Circuits Using Nodal Equations for
Efficient Factorization [p. 17]
- Bernard N. Sheehan
ENOR is an innovative way to produce provably-passive, reciprocal, and
compact representations of RLC circuits. Beginning with the nodal equations,
ENOR formulates recurrence relations for the moments that involve
factorizing a symmetric, positive definite matrix; this contrasts with other
RLC order reduction algorithms that require expensive LU factorization.
It handles floating capacitors, inductor loops, and resistor links in a
uniform way. It distinguishes between active and passive ports, does
Gram-Schmidt orthogonalization on the fly, controls error in the
time-domain. ENOR is a superbly simple, flexible, and well-conditioned
algorithm for lightning reduction of mega-sized RLC trees, meshes, and
coupled interconnects-all with excellent accuracy.
Chair: Fabio Somenzi
Organizers:Sharad Malik, Andrew Kahng
-
2.1 Why is ATPG easy? [p. 22]
- Mukul R. Prasad, Philip Chong, Kurt Keutzer
Empirical observation shows that practically encountered instances
of ATPG are efficiently solvable. However, it has been known for
more than two decades that ATPG is an NP-complete problem. This
work is one of the first attempts to reconcile these seemingly disparate
results. We introduce the concept of circuit cut-width and
characterize the complexity of ATPG in terms of this property. We
provide theoretical and empirical results to argue that an interestingly
large class of practical circuits have cut-width characteristics
which ensure a provably efficient solution of ATPG on them.
-
2.2 Using Lower Bounds during Dynamic BDD Minimization [p. 29]
- Rolf Drechsler, Wolfgang Günther
Ordered Binary Decision Diagrams (BDDs) are a data structure
for representation and manipulation of Boolean functions often
applied in VLSI CAD. The choice of the variable ordering largely
influences the size of the BDD; its size may vary from linear to
exponential. The most successful methods for finding good orderings
are based on dynamic variable reordering, i.e. exchanging of neighboring
variables. This basic operation has been used in various variants,
like sifting and window permutation. In this paper we show that
lower bounds computed during the minimization process can speed up
the computation significantly. First, lower bounds are studied from
a theoretical point of view. Then these techniques are incorporated in
dynamic minimization algorithms. By the computation of good
lower bounds large parts of the search space can be pruned
resulting in very fast computations. Experimental results are
given to demonstrate the efficiency of our approach.
-
2.3 Optimization-Intensive Watermarking Techniques for Decision Problems [p. 33]
- Gang Qu, Jennifer L. Wong, Miodrag Potkonjak
Recently, a number of watermarking-based intellectual property
protection techniques have been proposed. Although they have
been applied to different stages in the design process and have a
great variety of technical and theoretical features, all of them share
two common properties: they all have been applied solely to optimization
problems and do not involve any optimization during the
watermarking process.
In this paper, we propose the first set of optimization-intensive
watermarking techniques for decision problems. In particular, we
demonstrate how one can select a subset of superimposed water-marking
constraints so that the uniqueness of the signature and the
likelihood of satisfying an instance of the satisfiability problem are
simultaneously maximized. We have developed three watermarking
SAT techniques: adding clauses, deleting literals, push-out
and pull-back. Each technique targets different types of signature-induced
constraint superimposition on an instance of the SAT problem.
In addition to comprehensive experimental validation, we theoretically
analyze the potentials and limitations of the proposed
watermarking techniques. Furthermore, we analyze the three proposed
optimization-intensive watermarking SAT techniques in terms
of their suitability for copy detection.
-
2.4 Efficient Algorithms for Optimum Cycle Mean and Optimum Cost to Time
Ratio Problems [p. 37]
- Ali Dasdan, Sandy S. Irani, Rajesh K. Gupta
The goal of this paper is to identify the most efficient algorithms
for the optimum mean cycle and optimum cost
to time ratio problems and compare them with the popular
ones in the CAD community. These problems have numerous
important applications in CAD, graph theory, discrete
event system theory, and manufacturing systems. In
particular, they are fundamental to the performance analysis
of digital systems such as synchronous, asynchronous,
data flow, and embedded real-time systems. For instance,
algorithms for these problems are used to compute the cycle
period of any cyclic digital system. Without loss of generality,
we discuss these algorithms in the context of the
minimum mean cycle problem (MCMP). We performed a
comprehensive experimental study of ten leading algorithms
for MCMP. We programmed these algorithms uniformly and
efficiently. We systematically compared them on a test suite
composed of random graphs as well as benchmark circuits.
Above all, our results provide important insight into the performance
of these algorithms in practice. One of the most
surprising results of this paper is that Howard's algorithm,
known primarily in the stochastic control community, is by
far the fastest algorithm on our test suite although the only
known bound on its running time is exponential. We provide
two stronger bounds on its running time.
Chair: David Ku
Organizers: Rajesh Gupta, David Ku
-
3.1 IP-Based Design Methodology [p. 43]
- Daniel D. Gajski
-
3.2 IPCHINOOK: An Integrated IP-based Design Framework for Distributed Embedded
Systems [p. 44]
- Pai Chou, Ross Ortega, Ken Hines, Kurt Partridge, Gaetano Borriello
IPCHINOOK is a design tool for distributed embedded systems. It
gains leverage from the use of a carefully chosen set of design
abstractions that raise the level of designer interaction during the
specification, synthesis, and simulation of the design. IPCHINOOK
focuses on a component-based approach to system building that
enhances the ability to reuse existing software modules. This is
accomplished through a new model for constructing components
that enables composition of control-flow as well as data-flow. The
designer then maps the elements of the specification to a target architecture:
a set of processing elements and communication channels.
IPCHINOOK synthesizes all of the detailed communication
and synchronization instructions. Designers get feedback via a co-simulation
engine that permits rapid evaluation. By shortening the
design cycle, designers are able to more completely explore the design
space of possible architectures and/or improve time-to-market.
IPCHINOOK is embodied in a system development environment that
supports the design methodology by integrating a user interface for
system specification, simulation, and synthesis tools. By raising
the level of abstraction of specifications above the low-level target-specific
implementation, and by automating the generation of these
difficult and error-prone details, IPCHINOOK lets designers focus
on global architectural and functionality decisions.
-
3.3 Virtual Simulation of Distributed IP-based Designs [p. 50]
- Marcello Dalpasso, Alessandro Bogliolo, Luca Benini
One key issue in design flows based on reuse of third-party
intellectual property (IP) components is the need to estimate
the impact of component instantiation within complex
designs. In this paper we introduce JavaCAD, an internet-based
EDA tool built on a secure client-server architecture
that enables designers to perform simulation and cost estimation
of circuits containing IP components without actually purchasing them.
At the same time, the tool ensures intellectual property protection
for the vendors of IP components, and for the IP-users as well.
Moreover, JavaCAD supports negotiation of the amount of information
and the accuracy of cost estimates, thereby providing seamless transition
between IP evaluation and purchase.
Chair: Tadahiro Kuroda
Organizers:Anantha Chandrakasan, Mojy Chian
-
4.1 Common-Case Computation: A High-Level Technique for Power and Performance
Optimization [p. 56]
- Ganesh Lakshminarayana, Anand Raghunathan, Kamal S. Khouri,
Niraj K. Jha, Sujit Dey
This paper presents a design methodology, called common-case computation
(CCC), and new design automation algorithms for optimizing
power consumption or performance. The proposed techniques
are applicable in conjunction with any high-level design methodology
where a structural register-transfer level (RTL) description and
its corresponding scheduled behavioral (cycle-accurate functional
RTL) description are available. It is a well-known fact that in behavioral
descriptions of hardware (also in software), a small set of
computations (CCCs) often accounts for most of the computational
complexity. However, in hardware implementations (structural RTL
or lower level), CCCs and the remaining computations are typically
treated alike. This paper shows that identifying and exploiting
CCCs during the design process can lead to implementations that
are much more efficient in terms of power consumption or performance.
We propose a CCC-based high-level design methodology
with the following steps: extraction of common-case behaviors and
execution conditions from the scheduled description, simplification
of the common-case behaviors in a stand-alone manner, synthesis
of common-case detection and execution circuits from the common-case
behaviors, and composing the original design with the common-case
circuits, resulting in a CCC-optimized design. We demonstrate
that CCC-optimized designs reduce power consumption by up to
91.5%, or improve performance by up to 76.6% compared to designs
derived without special regard for CCCs.
-
4.2 Layout Techniques Supporting the Use of Dual Supply Voltages
for Cell-Based Designs [p. 62]
- Chingwei Yeh, Yin-Shuin Kang, Shan-Jih Shieh, Jinn-Shyan Wang
Gate-level voltage scaling is an approach that allows different supply
voltages for different gates in order to achieve power reduction.
Previous researches focused on determining the voltage level for
each gate and ascertaining the power saving capability of the approach
via logic-level power estimation. In this paper, we present
the layout techniques that feasiblize the approach in cell-based design
environment. A new block layout style is proposed to support
the voltage scaling with conventional standard cell libraries. The
block layout can be automatically generated via a simulated annealing
based placement algorithm. In addition, we propose a new
cell layout style with built-in multiple supply rails. Using the cell
layout, gate-level voltage scaling can be immediately embedded in
a typical cell-based design flow. Experimental results show that
proposed techniques produce very promising results.
-
4.3 Gate-Level Design Exploiting Dual Supply Voltages for Power-Driven
Applications [p. 68]
- Chingwei Yeh, Min-Cheng Chang, Shih-Chieh Chang, Wen-Bone Jone
The advent of portable and high-density devices has made power
consumption a critical design concern. In this paper, we address
the problem of reducing power consumption via gate-level voltage
scaling for those designs that are not under the strictest timing budget.
We first use a maximum-weighted independent set formulation
for voltage reduction on non-critical part of the circuit. Then, we
use a minimum-weighted separator set formulation to do gate sizing
and integrate the sizing procedure with a voltage scaling procedure
to enhance power saving on the whole circuit. The proposed
methods are evaluated using the MCNC benchmark circuits. and
an average of 19.12% power reduction over the circuits having only
one supply voltage has been achieved.
-
4.4 Synthesis of Low Power CMOS VLSI Circuits Using Dual Supply Voltages [p. 72]
- Vijay Sundararajan, Keshab K. Parhi
Dynamic power consumed in CMOS gates goes down quadratically with
the supply voltage. By maintaining a high supply voltage for gates on the
critical path and by using a low supply voltage for gates off the critical path
it is possible to dramatically reduce power consumption in CMOS VLSI
circuits without performance degradation. Interfacing gates operating under
multiple supply voltages, however, requires the use of level converters,
which makes the problem modeling difficult. In this paper we develop a
formal model and develop an efficient heuristic for addressing the use of
two supply voltages for low power CMOS VLSI circuits without performance
degradation. Power consumption savings up to 25% over and above
the best known existing heuristics are demonstrated for combinational circuits
in the ISCAS85 benchmark suite.
Moderator: Kurt Keutzer
Organizers: Raul Camposano, Kurt Keutzer
Panel Members: Jerry Fiddler, Raul Camposano, Alberto Sangiovanni-Vincentelli,
Jim Lansford [p. 76]
-
The merging of hardware and software on a single integrated circuit is
causing many to rethink their approach to embedded system design, and some
are forecasting significant changes in the dynamics of the associated
electronic-design automation (EDA) and embedded software industries as well.
For some, the ocean of silicon ahead is sure to host a loveboat for
fully integrated hardware/software systems. In this scenario a unifying
system-design-environment provides implementation-independent modeling that
stands above the particulars of hardware and software implementation issues.
At the implementation level, embedded software design tools and
electronic-design automation tools work seamlessly together providing a
variety of architectural targets for the functionality of high-level
descriptions.
Others see a shipwreck on the horizon, as hardheaded hardware designers
and softheaded software developers clash. In this scenario the system-on-chip
becomes a battleground in which the two respective communities attempt to
dominate both the solution space, and the system-design budgets of their
common customer.
There's an old adage that the most boring scenario is the most likely.
Following this adage some predict that the future holds more 'no show' than
'showdown'. The holders of this view note that as severe power constraints
cause hardware-designers to eschew software solutions, and as the world
of ubiquitous computing significantly broadens the market for embedded
system software, then the hardware and software communities will be simply
chips passing in the night.
Chair: Patrick Groeneveld
Organizers: Patrick Groeneveld, Lou Scheffer
-
6.1 Reliability-Constrained Area Optimization of VLSI Power/Ground Networks Via
Sequence of Linear Programmings [p. 78]
- Xiang-Dong Tan, C.-J. Richard Shi, Dragos Lungeanu, Jyh-Chwen Lee,
Li-Pen Yuan
This paper presents a new method for determining the widths
of the power and ground routes in integrated circuits so that
the area required by the routes is minimized subject to the
reliability constraints. The basic idea is to transform the resulting
constrained nonlinear programming problem into a
sequence of linear programs. Theoretically, we show that
the sequence of linear programs always converges to the optimum
solution of the relaxed convex problem. Experimental
results demonstrate that the sequence-of-linear-programming
method is orders of magnitude faster than the best-known
method based on conjugate gradients, with constantly better
optimization solutions.
-
6.2 FAR-DS: Full-plane AWE Routing with Driver Sizing [p. 84]
- Jiang Hu, Sachin S. Sapatnekar
We propose a Full-plane AWE Routing with Driver Sizing (FAR-DS)
algorithm for performance driven routing in deep sub-micron
technology. We employ a fourth order AWE delay model in the
full plane, including both Hanan and non-Hanan points. Optimizing
the driver size simultaneously extends our work into a two-dimensional
space, enabling us to achieve the desired balance between
wire and driver cost reduction, while satisfying the timing
constraints. Compared to SERT, experimental results showed that
our algorithm can provide an average reduction of 23% in the wire
cost and 50% in the driver cost under stringent timing constraints.
-
6.3 Noise-Constrained Performance Optimization by Simultaneous Gate and Wire
Sizing Based on Lagrangian Relaxation [p. 90]
- Hui-Ru Jiang, Jing-Yang Jou, Yao-Wen Chang
Noise, as well as area, delay, and power, is one of the most important
concerns in the design of deep submicron ICs. Currently existing
algorithms can not handle simultaneous switching conditions of
signals for noise minimization. In this paper, we model not only physical
coupling capacitance, but also simultaneous switching behavior
for noise optimization. Based on Lagrangian relaxation, we present
an algorithm that can optimally solve the simultaneous noise, area,
delay, and power optimization problem by sizing circuit components.
Our algorithm, with linear memory requirement overall and linear
runtime per iteration, is very effective and efficient. For example, for
a circuit of 6144 wires and 3512 gates, our algorithm solves the simultaneous
optimization problem using only 2.1 MB memory and 47
minute runtime to achieve the precision of within 1% error on a SUN
UltraSPARC-I workstation.
-
6.4 Simultaneous Routing and Buffer Insertion with Restrictions on Buffer
Locations [p. 96]
- Hai Zhou, D. F. Wong, I-Min Liu, Adnan Aziz
During the routing of global interconnects, macro blocks form
useful routing regions which allow wires to go through but forbid
buffers to be inserted. They give restrictions on buffer
locations. In this paper, we take these buffer location restrictions
into consideration and solve the simultaneous maze
routing and buffer insertion problem. Given a block placement
defining buffer location restrictions and a pair of pins
(a source and a sink), we give a polynomial time exact algorithm
to find a buffered route from the source to the sink with
minimum Elmore delay.
-
6.5 Crosstalk Minimization using Wire Perturbations [p. 100]
- Prashant Saxena, C.L. Liu
We study the variation of the crosstalk in a net and its neighbors
when one of its trunks is perturbed, showing that the trunk's perturbation
range can be efficiently divided into subintervals having
monotonic or unimodal crosstalk variation. We can therefore determine
the optimum trunk location without solving any non-linear
equations. Using this, we construct and experimentally verify an
algorithm to minimize the peak net crosstalk in a gridless channel.
Chair: Prabhakar Kudva
Organizers: Timothy Kam, Luciano Lavagno
-
7.1 Practical Advances in Asynchronous Design and in Asynchronous/Synchronous
Interfaces [p. 104]
- Erik Brunvand, Steven Nowick, Kenneth Yun
Asynchronous systems are being viewed as an increasingly
viable alternative to purely synchronous systems. This
paper gives an overview of the current state of the art in
practical asynchronous circuit and system design in four areas:
controllers, datapaths, processors, and the design of
asynchronous/synchronous interfaces.
-
7.2 Automatic Synthesis and Optimization of Partially Specified Asynchronous
Systems [p. 110]
- Alex Kondratyev, Jordi Cortadella, Michael Kishinevsky, Luciano Lavagno,
Alexander Yakovlev
A method for automating the synthesis of asynchronous control
circuits from high level (CSP-like) and/or partial STG (involving
only functionally critical events) specifications is presented. The
method solves two key subtasks in this new, more flexible, design
flow: handshake expansion, i.e. inserting reset events with
maximum concurrency, and event reshuffling under interface and
concurrency constraints, by means of concurrency reduction. In
doing so, the algorithm optimizes the circuit both for size and performance.
Experimental results show a significant increase in the
solution space explored when compared to existing CSP-based or
STG-based synthesis tools.
-
7.3 CAD Directions for High Performance Asynchronous Circuits [p. 116]
- Ken Stevens, Shai Rotem, Steven M. Burns, Jordi Cortadella, Ran Ginosar,
Michael Kishinevsky, Marly Roncken
This paper describes a novel methodology for high performance
asynchronous design based on timed circuits and on
CAD support for their synthesis using Relative Timing. This
methodology was developed for a prototype iA32 instruction
length decoding and steering unit called RAPPID ("Revolving
Asynchronous Pentium R Processor Instruction Decoder")
that was fabricated and tested successfully. Silicon
results show significant advantages - in particular, performance
of 2.5-4.5 instructions per nS - with manageable risks
using this design technology. RAPPID achieves three times
faster performance and half the latency dissipating only half
the power and requiring a minor area penalty as a comparable
400MHz clocked circuit.
Relative Timing is based on user-defined and automatically
extracted relative timing assumptions between signal
transitions in a circuit and its environment. It supports the
specification, synthesis, and verification of high-performance
asynchronous circuits, such as pulse-mode circuits, that can
be derived from an initial speed-independent specification.
Relative timing presents a "middle-ground" between clocked
and asynchronous circuits, and is a fertile area for CAD development.
We discuss possible directions for future CAD
development.
Chair: Rajesh K. Gupta
Organizers: David Ku, Rajesh Gupta
-
8.1 A Low Power Hardware/Software Partitioning Approach for Core-based Embedded
Systems [p. 122]
- Jörg Henkel
We present a novel approach that minimizes the
power consumption of embedded core-based systems through hardware/
software partitioning. Our approach is based on the idea of
mapping clusters of operations/instructions to a core that yields
a high utilization rate of the involved resources (ALUs, multipliers,
shifters, ...) and thus minimizing power consumption. Our approach
is comprehensive since it takes into consideration the power
consumption of a whole embedded system comprising a microprocessor
core, application specific (ASIC) core(s), cache cores and
a memory core. We report high reductions of power consumption
between 35% and 94% at the cost of a relatively small additional
hardware overhead of less than 16k cells while maintaining or even
slightly increasing the performance compared to the initial design.
-
8.2 Synthesis of Low-Overhead Interfaces for Power-Efficient Communication
over Wide Buses [p. 128]
- L. Benini, A. Macii, E. Macii, M. Poncino, R. Scarsi
In this paper we present algorithms for the synthesis of encoding
and decoding interface logic that minimizes the average
number of transitions on heavily-loaded global bus lines. The
approach automatically constructs low-transition activity codes
and hardware implementation of encoders and decoders, given
information on word-level statistics. We present an accurate
method that is applicable to low-width buses, as well as approximate
methods that scale well with bus width. Furthermore,
we introduce an adaptive architecture that automatically adjusts
encoding to reduce transition activity on buses whose word-level
statistics are not known a-priori. Experimental results demonstrate
that our approach well outperforms low-power encoding
schemes presented in the past.
-
8.3 Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems
[p. 134]
- Youngsoo Shin, Kiyoung Choi
Power efficient design of real-time systems based on programmable
processors becomes more important as system functionality is increasingly
realized through software. This paper presents a power-efficient
version of a widely used fixed priority scheduling method.
The method yields a power reduction by exploiting slack times,
both those inherent in the system schedule and those arising from
variations of execution times. The proposed run-time mechanism
is simple enough to be implemented in most kernels. Experimental
results show that the proposed scheduling method obtains a significant
power reduction across several kinds of applications.
-
8.4 Memory Exploration for Low Power, Embedded Systems [p. 140]
- Wen-Tsong Shiue, Chaitali Chakrabarti
In embedded system design, the designer has to choose an on-chip
memory configuration that is suitable for a specific
application. To aid in this design choice, we present a memory
exploration strategy based on three performance metrics,
namely, cache size, the number of processor cycles and the
energy consumption. We show how the performance is affected
by cache parameters such as cache size, line size, set
associativity and tiling, and the off-chip data organization. We
show the importance of including energy in the performance
metrics, since an increase in the cache line size, cache size, tiling
and set associativity reduces the number of cycles but does not
necessarily reduce the energy consumption. These performance
metrics help us find the minimum energy cache configuration if
time is the hard constraint, or the minimum time cache
configuration if energy is the hard constraint.
Keywords:
Design automation, Low power design, Memory hierarchy, Low
power embedded systems, Memory exploration and
optimization, Cache simulator, Off-chip data assignment.
Chair: Asawaree Kalavade
Organizers: Raul Camposano, Asawaree Kalavade
-
9.1 Distributed Application Development with Inferno [p. 146]
- Ravi Sharma
Distributed computing has taken a new importance in order to
meet the requirements of users demanding information "anytime,
anywhere." Inferno facilitates the creation and support of
distributed services in the new and emerging world of network
environments. These environments include a world of varied
terminals, network hardware, and protocols. The Namespace is a
critical Inferno concept that enables the participants in this
network environment to deliver resources to meet the many
needs of diverse users.
This paper discusses the elements of the Namespace technology.
Its simple programming model and network transparency is
demonstrated through the design of an application that can have
components in several different nodes in a network. The
simplicity and flexibility of the solution is highlighted.
Keywords:
Inferno, InfernoSpaces, distributed applications, Styx,
networking protocols.
-
9.2 Embedded Application Design Using a Real-Time OS [p. 151]
- David Stepner, Nagarajan Rajan, David Hui
You read about it everywhere: distributed computing is the next
revolution, perhaps relegating our desktop computers to the
museum. But in fact the age of distributed computing has been
around for quite a while. Every time we withdraw money from an
ATM, start our car, use our cell phone, or microwave our dinner,
microprocessors are at work performing dedicated functions.
These are examples of just a very few of the thousands of
"embedded systems."
Until recently the vast majority of these embedded systems used
8- and 16-bit microprocessors, requiring little in the way of
sophisticated software development tools, including an Operating
System (OS). But the breaking of the $5 threshold for 32-bit
processors is now driving an explosion in high-volume embedded
applications. And a new trend towards integrating a full system-on-a-chip
(SOC) promises a further dramatic expansion for 32-bit
embedded applications as we head into the 21 st century.
-
9.3 The JiniTM Architecture: Dynamic Services in a Flexible
Network [p. 157]
- Ken Arnold
This paper gives an overview of the JiniTM
architecture, which provides a federated infra-structure
for dynamic services in a network. Services may be large or
small.
1.1 Keywords:
Jini, Java, networks, distribution, distributed computing
Chair: Rajesh Raina
Organizers: Raj Raina, Vivek Tiwari
-
10.1 Verifying Large-Scale Multiprocessors Using an Abstract Verification
Environment [p. 163]
- Dennis Abts, Mike Roberts
The complexity of large-scale multiprocessors has burdened the
design and verification process making complexity-effective functional
verification an elusive goal. We propose a solution to the
verification of complex systems by introducing an abstracted verification
environment called Raven. We show how Raven uses standard
C/C++ to extend the capability of contemporary discrete-event
logic simulators. We introduce new data types and a diagnostic
programming interface (DPI) that provide the basis for Raven.
Finally, we show results from an interconnect router ASIC used in
a large-scale multiprocessor.
-
10.2 Functional Verification of the Equator MAP1000 Microprocessor [p. 169]
- Jian Shen, Jacob Abraham, Dave Baker, Tony Hurson, Martin Kinkade,
Gregorio Gervasio, Chen-Chau Chu, Guanghui Hu
The Advanced VLIW architecture of the Equator
MAP1000 processor has many features that present significant
verification challenges. We describe a functional verification
methodology to address this complexity. In particular,
we present an efficient method to generate directed
assembly tests and a novel technique using the processor itself
to control self-tests and check the results at speed using
native instructions only. We also describe the use of emulation
in both pre-silicon and post-silicon verification stages.
-
10.3 Micro Architecture Coverage Directed Generation of Test Programs [p. 175]
- Shmuel Ur, Yoav Yadin
In this paper, we demonstrate a method for generation of assembler
test programs that systematically probe the micro architecture
of a PowerPC superscalar processor. We show innovations such
as ways to make small models for large designs, predict, with
cycle accuracy the movement of instructions through the pipes
(taking into account stalls and dependencies) and generation of
test programs such that each reaches a new micro architectural
state. We compare our method to the established practice of massive
random generation and show that the quality of our tests, as
measured by transition coverage, is much higher. The main contribution
of this paper is not in theory, as the theory has been discussed
in previous papers, but in describing how to translate this
theory into practice in a practical way, a task that was far from
trivial.
-
10.4 Verification of a Microprocessor Using Real World Applications [p. 181]
- You-Sung Chang, Seungjong Lee, In-Cheol Park, Chong-Min Kyung
In this paper, we describe a fast and convenient verification methodology
for microprocessor using large-size, real application programs
as test vectors. The verification environment is based on
automatic consistency checking between the golden behavioral reference
model and the target HDL model, which are run in an hand-shaking
fashion. In conjunction with the automatic comparison
facility, a new HDL saver is proposed to accelerate the verification
process. The proposed saver allows 'restart' from the nearest
checkpoint before the point of inconsistency detection regardless of
whether any modification on the source code is made or not. It is
to be contrasted with conventional saver that does not allow restart
when some design change, or debugging is made. We have proved
the effectiveness of the environment through applying it to a real-world
example, i.e., Pentium-compatible processor design process.
It was shown that the HDL verification with the proposed saver can
be faster and more flexible than the hardware emulation approach.
In short, it was demonstrated that restartability with source code
modification capability is very important in obtaining the short debugging
turnaround time by eliminating a large number of redundant
simulations.
-
10.5 High-Level Test Generation for Design Verification of Pipelined
Microprocessors [p. 185]
- David Van Campenhout, Trevor Mudge, John P. Hayes
This paper addresses test generation for design verification of pipe-lined
microprocessors. To handle the complexity of these designs,
our algorithm integrates high-level treatment of the datapath with
low-level treatment of the controller, and employs a novel "pipe-frame"
organization that exploits high-level knowledge about the
operation of pipelines. We have implemented the proposed algorithm
and used it to generate verification tests for design errors in a
representative pipelined microprocessor.
Keywords: design verification, sequential test generation, high-level
test generation, pipelined microprocessors.
-
10.6 Developing an Architecture Validation Suite - Application to the PowerPC
Architecture [p. 189]
- Laurent Fournier, Anatoly Koyfman, Moshe Levinger
This paper describes the efforts made and the results of creating
an Architecture Validation Suite for the PowerPC architecture.
Although many functional test suites are available for multiple
architectures, little has been published on how these suites are
developed and how their quality should be measured. This work
provides some insights for approaching the difficult problem of
building a high quality functional test suite for a given architecture.
By defining a set of generic coverage models that combine
program-based, specification-based, and sequential bug-driven
models, it establishes the groundwork for the development of
architecture validation suites for any architecture.
Chair: Alan Mantooth
Organizers: Hidetoshi Onodera, Alan Mantooth
-
11.1 Passive Reduced-Order Models for Interconnect Simulation and their
Computation via Krylov-Subspace Algorithms [p. 195]
- Roland W. Freund
This paper studies a projection technique based on block Krylov
subspaces for the computation of reducedorder models of multiport
RLC circuits. We show that these models are always passive,
yet they still match at least half as many moments as the corresponding
reduced-order models based on matrixPadé approximation.
For RC, RL, and LC circuits, the reduced-order models obtained
by projection and matrix-Padé approximation are identical. For
general RLC circuits, we show how the projection technique can
easily be incorporated into the SyMPVL algorithm to obtain passive
reduced-order models, in addition to the high-accuracy matrix-Padé
approximations that characterize SyMPVL, at essentially no extra
computational costs. Connections between SyMPVL and the recently
proposed reduced-order modeling algorithmPRIMA are also
discussed. Numerical results for interconnect simulation problems
are reported.
-
11.2 Model Order-Reduction of RC(L) Interconnect including Variational
Analysis [p. 201]
- Ying Liu, Lawrence T. Pileggi, Andrzej J. Strojwas
As interconnect feature sizes continue to scale to smaller dimensions,
long interconnect can dominate the IC timing performance,
but the interconnect parameter variations make it
difficult to predict these dominant delay extremes. This paper
presents a model order-reduction technique for RLC interconnect
circuits that includes variational analysis to capture manufacturing
variations. Matrix perturbation theory is combined
with dominant-pole-analysis and Krylov-subspace-analysis
methods to produce reduced-order models with direct inclusion
of statistically independent manufacturing variations. The accuracy
of the resulting variational reduced-order models is
demonstrated on several industrial examples.
-
11.3 Robust Rational Function Approximation Algorithm for Model Generation
[p. 207]
- Carlos P. Coelho, Joel R. Phillips, L. Miguel Silveira
The problem of computing rational function approximations to
tabulated frequency data is of paramount importance in the modeling
arena. In this paper we present a method for generating
a state space model from tabular data in the frequency domain
that solves some of the numerical difficulties associated with the
traditional fitting techniques used in linear least squares approximations.
An extension to the MIMO case is also derived.
Chair: Timothy Y. Kam
Organizers: Kazutoshi Wakabayashi, Timothy Kam
-
12.1 Behavioral Network Graph: Unifying the Domains of High-Level and Logic
Synthesis [p. 213]
- Reinaldo A. Bergamaschi
High-level synthesis operates on internal models known as
control/data flow graphs (CDFG) and produces a register-transfer-level
(RTL) model of the hardware implementation
for a given schedule. For high-level synthesis to be efficient
it has to estimate the effect that a given algorithmic decision
(e.g., scheduling, allocation) will have on the final
hardware implementation (after logic synthesis). Currently,
this effect cannot be measured accurately because the CDFGs
are very distinct from the RTL/gate-level models used
by logic synthesis, precluding interaction between high-level
and logic synthesis. This paper presents a solution to this
problem consisting of a novel internal model for synthesis
which spans the domains of high-level and logic synthesis.
This model is an RTL/gate-level network capable of representing
all possible schedules that a given behavior may
assume. This representation allows high-level synthesis algorithms
to be formulated as logic transformations and effectively
interleaved with logic synthesis.
-
12.2 Soft Scheduling in High Level Synthesis [p. 219]
- Jianwen Zhu, Daniel D. Gajski
In this paper, we establish a theoretical framework for a new
concept of scheduling called soft scheduling. In contrasts
to the traditional schedulers referred as hard schedulers, soft
schedulers make soft decisions at a time, or decisions that
can be adjusted later. Soft scheduling has a potential to alleviate
the phase coupling problem that has plagued traditional
high level synthesis (HLS), HLS for deep submicron
design and VLIW code generation. We then develop a specific
soft scheduling formulation, called threaded schedule,
under which a linear, optimal (in the sense of online optimality)
algorithm is guaranteed.
-
12.3 Graph Coloring Algorithms for Fast Evaluation of Curtis Decompositions
[p. 225]
- Marek Perkowski, Rahul Malvi, Stan Grygiel, Mike Burns, Alan
Mishchenko
Finding the minimum column multiplicity for a bound set
of variables is an important problem in Curtis decomposition.
To investigate this problem, we compared two graph-coloring
programs: one exact, and another one based on
heuristics which can give, however, provably exact results
on some types of graphs. These programs were incorporated
into the multi-valued decomposer MVGUD. We proved that
the exact graph coloring is not necessary for high-quality
functional decomposers. Thus we improved by orders of
magnitude the speed of the column multiplicity problem,
with very little or no sacrifice of decomposition quality.
Comparison of our experimental results with competing decomposers
shows that for nearly all benchmarks our solutions
are best and time is usually not too high.
Chair: Narendra V Shenoy
Organizers: Timothy Kam, Luciano Lavagno
-
13.1 Maximizing Performance by Retiming and Clock Skew Scheduling [p. 231]
- Xun Liu, Marios C. Papaefthymiou, Eby G. Friedman
The application of retiming and clock skew scheduling for improving
the operating speed of synchronous circuits under setup and
hold constraints is investigated in this paper. It is shown that when
both long and short paths are considered, circuits optimized by
the simultaneous application of retiming and clock scheduling can
achieve shorter clock periods than optimized circuits generated by
applying either of the two techniques separately. A mixed-integer
linear programming formulation and an efficient heuristic are given
for the problem of simultaneous retiming and clock skew scheduling
under setup and hold constraints. Experiments with benchmark
circuits demonstrate the efficiency of this heuristic and the effectiveness
of the combined optimization. All of the test circuits show
improvement. For more than half of them, the maximum operating
speed increases by more than 21% over the optimized circuits obtained
by applying retiming or clock skew scheduling separately.
-
13.2 A Practical Approach to Multiple-Class Retiming [p. 237]
- Klaus Eckl, Jean Christophe Madre, Peter Zepter, Christian Legl
Retiming is an optimization technique for synchronous circuits introduced
by Leiserson and Saxe in 1983. Although powerful, retiming is not very
widely used because it does not handle in a satisfying way circuits
whose registers have load enable, synchronous and asynchronous set/clear
inputs. We propose an extension of retiming whose basis is the
characterization of registers into register classes. The new approach
called multiple-class retiming handles circuits with an arbitrary number
of register classes. We present results on a set of industrial FPGA
designs showing the effectiveness and efficiency of multiple-class retiming.
-
13.3 Performance-driven Integration of Retiming and Resynthesis [p. 243]
- Peichen Pan
We present a novel approach to performance optimization
by integrating retiming and resynthesis. The approach
is oblivious of register boundaries during resynthesis.
In addition, it guides resynthesis by a criterion that
is directly tied to the performance target. The proposed
approach obtains provable results. Experimental results further
demonstrate the effectiveness of our approach.
-
13.4 Kernel-Based Power Optimization of RTL Components: Exact and Approximate
Extraction Algorithms [p. 247]
- L. Benini, G. De Micheli, E. Macii, G. Odasso, M. Poncino
Sequential logic optimization based on the extraction of computational
kernels has proved to be very promising when the target
is power minimization. Efficient extraction of the kernels is at
the basis of the optimization paradigm; the extraction procedures
proposed so far exploit common logic synthesis transformations,
and thus assume the availability of a gate-level description of
the circuit being optimized. In this paper we present exact and
approximate algorithms for the automatic extraction of computational
kernels directly from the functional specification of a
RTL component. We show the effectiveness of such algorithms
by reporting the results of an extensive experimentation we have
carried out on a large set of standard benchmarks, as well as
on some designs with known functionality.
Chair: Patrick Scaglia
Organizers: Patrick Scaglia, Jim Rowson
-
14.1 Customized Instruction-Sets For Embedded Processors [p. 253]
- Joseph A. Fisher
It is generally believed that there will be little more variety in
CPU architectures, and thus the design of Instruction-set
Architectures (ISAs) will have no role in the future of embedded
CPU design. Nonetheless, it is argued in this paper that
architectural variety will soon again become an important topic,
with the major motivation being increased performance due to the
customization of CPUs to their intended use. Five major barriers
that could hinder customization are described, including the
problems of existing binaries, toolchain development and
maintenance costs, lost savings/higher chip cost due to the lower
volumes of customized processors, added hardware development
costs, and some factors related to the product development cycle
for embedded products. Each is discussed, along with potential,
sometimes surprising, solutions.
Keywords:
Embedded processors, custom processors, instruction-level
parallelism, VLIW, mass customization of toolchains
-
14.2 System-Level Hardware/Software Trade-offs [p. 258]
- Samuel P. Harbison
Operating systems and development tools can
impose overly general requirements that prevent
an embedded system from achieving its
hardware performance entitlement. It is time
for embedded processor designers to become
more involved with system software and tools.
Keywords: Digital signal processors, instruction set architecture, compiler,
real-time operating system, software configuration.
Chair: Jonah Mcleod
Organizers: Ghulam Nurie, Mike Murray
Panel Members: Nozar Azarakhsh, Glen Ewing, Paul Gingras, Scott Reedstrom,
Chris Rowen [p. 260]
-
Achieving timely and comprehensive functional design verification
is a ubiquitous problem in electronics. This panel offers perspectives
on verification from designers of cardiac pacemakers,
communications satellites, compute servers, networking equipment,
and IP.
The panelists will begin by dissecting the bottlenecks in their
verification processes. For example, are simulators too slow? Or do
test vector generation and coverage analysis consume the most
time? The panelists will present their ideas for new EDA products
which might accelerate verification. Finally, the panelists will discuss
what compromises they would accept in order to achieve this
acceleration. Would they learn a new HDL? Restrict their design
styles? Forsake legacy designs?
Chair: Louis Scheffer
Organizers: Lou Scheffer, Patrick Groeneveld
-
16.1 A Timing-Driven Soft-Marco Resynthesis Method in Interaction with Chip
Floorplanning [p. 262]
- Hsiao-Pin Su, Allen C.-H. Wu, Youn-Long Lin
In this paper, we present a complete chip design
method which incorporates a soft-macro resynthesis
method in interaction with chip floorplanning for area
and timing improvements. We develop a timing-driven
design flow to exploit the interaction between HDL synthesis
and physical design tasks. During each design
iteration, we resynthesize soft macros with either a relaxed
or a tightened timing constraint which is guided
by the post-layout timing information. The goal is to
produce area-efficient designs while satisfying the timing
constraints. Experiments on a number of industrial
designs have demonstrated that by effectively relaxing
the timing constraint of the non-critical modules and
tightening the timing constraint of the critical modules,
a design can achieve 13% to 30% timing improvements
with little to no increase in chip area.
-
16.2 An O-Tree Representation of Non-Slicing Floorplan and Its Applications
[p. 268]
- Pei-Ning Guo, Chung-Kuan Cheng, Takeshi Yoshimura
We present an ordered tree, O-tree, structure to represent
non-slicing floorplans. The O-tree uses only
n(2 + [lg n]) bits for a floorplan of n rectangular blocks.
We define an admissible placement as a compacted
placement in both x and y direction. For each admissible
placement, we can find an O-tree representation. We
show that the number of possible O-tree combinations is
O(n!22n - 2 / n1.5 ). This is very concise compared to a
sequence pair representation which has O((n!)2) combinations.
The approximate ratio of sequence pair and O-tree
combinations is O(n2(n/4e)n). The complexity of
O-tree is even smaller than a binary tree structure for
slicing floorplan which has O(n! 25n -3/ n1.5)
combinations.
Given an O-tree, it takes only linear time to construct
the placement and its constraint graph. We have
developed a deterministic floorplanning algorithm utilizing
the structure of O-tree. Empirical results on MCNC
benchmarks show promising performance with average
16% improvement in wire length, and 1% less in dead
space over previous CPU-intensive cluster refinement
method.
-
16.3 Module Placement for Analog Layout Using the Sequence-Pair Representation
[p. 274]
- Florin Balasa, Koen Lampaert
This paper addresses the problem of device-level placement
for analog layout. Different from most of the existent approaches
employing basically simulated annealing optimization algorithms
operating on at Gellat-Jepsen spatial representations [2], we
are using a more recent topological representation called
sequence-pair [7], which has the advantage of not being
restricted to slicing floorplan topologies. In
this paper, we are explaining how specific features essential
to analog placement, as the ability to deal with symmetry
and device matching constraints, can be easily handled by
employing the sequence-pair representation. Several analog
examples substantiate the effectiveness of our placement
tool, which is already in use in an industrial environment.
Chair: Kazutoshi Wakabayashi
Organizers: Timothy Kam, Kazutoshi Wakabayashi
-
17.1 Genetic List Scheduling Algorithm for Scheduling and Allocation on a
Loosely Coupled Heterogeneous Multiprocessor System [p. 280]
- Martin Grajcar
Our problem consists of a partially ordered set of tasks communicating
over a shared bus which are to be mapped to a heterogeneous
multiprocessor system. The goal is to minimize the makespan,
while satisfying constrains implied by data dependencies and exclusive
resource usage.
We present a new efficient heuristic approach based on list
scheduling and genetic algorithms, which finds the optimum in few
seconds on average even for large examples (up to 96 tasks) taken
from [3]. The superiority of our algorithm compared to some other
algorithms is demonstrated.
Keywords: heterogeneous system design, heuristic, genetic algorithms,
list scheduling
-
17.2 Performance-Driven Scheduling with Bit-Level Chaining [p. 286]
- Sanghun Park, Kiyoung Choi
This paper presents a new scheduling algorithm that maximizes the
performance of a design under resource constraints in high-level
synthesis. The algorithm tries to achieve the maximal utilization of
resources and the minimal waste of clock slack time. Moreover, it
exploits the technique of bit-level chaining to target high-speed designs.
The algorithm tries non-integer multiple-cycling and chaining,
which allows multiple cycle execution of chained operations,
to further increase the performance at the cost of small increase in
the complexity of the control unit. Experimental results on several
datapath-intensive designs show significant improvement in execution
time, over the conventional scheduling algorithms.
-
17.3 A Model for Scheduling Protocol-Constrained Components and Environments
[p. 292]
- Steve Haynal, Forrest Brewer
This paper presents a technique for highly constrained
event sequence scheduling. System
resource protocols as well as an external interface
protocol are described by non-deterministic
finite automata (NFA). All valid schedules
which adhere to interfacing constraints and
resource bounds for flow graph described
behavior are determined exactly. A model and
scheduling results are presented for an extensive
design example.
Keywords:
Interface protocols, protocol-constrained scheduling, automata.
-
17.4 A Reordering Technique for Efficient Code Motion [p. 296]
- Luiz C. V. dos Santos, Jochen A. G. Jess
Emerging design problems are prompting the use of
code motion and speculative execution in high-level
synthesis to shorten schedules and meet tight time-constraints.
However, some code motions are not
worth doing from a worst-case execution perspective.
We propose a technique that selects the most
promising code motions, thereby increasing the density
of optimal solutions in the search space.
Chair: Andreas Kuehlmann
Organizers: Andreas Kuehlmann, Kunle Olukatun
-
18.1 Coverage Estimation for Symbolic Model Checking [p. 300]
- Yatin Hoskote, Timothy Kam, Pei-Hsin Ho, Xudong Zhao
Although model checking is an exhaustive formal verification
method, a bug can still escape detection if the erroneous behavior
does not violate any verified property. We propose a coverage metric
to estimate the "completeness" of a set of properties verified by
model checking. A symbolic algorithm is presented to compute this
metric for a subset of the CTL property specification language. It
has the same order of computational complexity as a model checking
algorithm. Our coverage estimator has been applied in the
course of some real-world model checking projects. We uncovered
several coverage holes including one that eventually led to the discovery
of a bug that escaped the initial model checking effort.
-
18.2 Improving Symbolic Traversals by Means of Activity Profiles [p. 306]
- Gianpiero Cabodi, Paolo Camurati, Stefano Quer
Symbolic techniques have undergone major improvements in the last few
years. Nevertheless they are still limited by the size of the involved BDDs,
and extending their applicability to larger and real circuits is a key issue.
Within this framework, we introduce "activity profiles" as a novel
technique to characterize transition relations. In our methodology a learning
phase is used to collect activity measures, related to time and space
cost, for each BDD node of the transition relation. We use inexpensive
reachability analysis as learning technique, and we operate within inner
steps of image computations involving the transition relation and state
sets.
The above informations can be used for several purposes. In particular,
we present an application of activity profiles in the field of reachability
analysis itself. We propose transition relation subsetting and partial
traversals of the state transition graph. We show that a sequence of partial
traversals is able to complete a reachability analysis problem with
smaller memory requirement and improved time performance.
-
18.3 Improved Approximate Reachability Using Auxiliary State Variables [p. 312]
- Shankar G. Govindaraju, David L. Dill, Jules P. Bergmann
Approximate reachability techniques trade off accuracy
for the capacity to deal with bigger designs. Cho
et al [4] proposed partitioning the set of state bits
into mutually disjoint subsets and doing symbolic forward
reachability on the individual subsets to obtain
an over approximation of the reachable state set. Recently
[7] this was improved upon by dividing the set of
state bits into various subsets that could possibly overlap,
and doing symbolic reachability over the overlapping
subsets. In this paper, we further improve on this
scheme by augmenting the set of state variables with
auxiliary state variables. These auxiliary state variables
are added to capture some important internal
conditions in the combinational logic. Approximate
symbolic forward reachability on overlapping subsets
of this augmented set of state variables yields much
tighter approximations than earlier methods.
-
18.4 Symbolic Model Checking using SAT procedures instead of BDDs [p. 317]
- A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, Y. Zhu
In this paper, we study the application of propositional decision
procedures in hardware verification. In particular, we apply
bounded model checking, as introduced in [1], to equivalence and
invariant checking. We present several optimizations that reduce
the size of generated propositional formulas. In many instances,
our SAT-based approach can significantly outperform BDD-based
approaches. We observe that SAT-based techniques are particularly
efficient in detecting errors in both combinational and sequential
designs.
Chair: Jan M. Rabaey
Organizers: David Blaauw, Niel Weste
-
19.1 Power Efficient Mediaprocessors: Design Space Exploration [p. 321]
- Johnson Kin, Chunho Lee, William H. Mangione-Smith, Miodrag Potkonjak
We present a framework for rapidly exploring the design space of
low power application-specific programmable processors (ASPP),
in particular media processors. We focus on a category of processors
that are programmable yet optimized to reduce power consumption
for a specific set of applications.
The key components of the framework presented in this paper
are a retargetable instruction level parallelism (ILP) compiler, processor
simulators, a set of complete media applications written in a
high level language and an architectural component selection algorithm.
The fundamental idea behind the framework is that with the
aid of a retargetable ILP compiler and simulators it is possible to
arrange architectural parameters (e.g., the issue width, the size of
cache memory units, the number of execution units, etc.) to meet
low power design goals under area constraints.
-
19.2 Global Multimedia System Design Exploration using Accurate Memory
Organization Feedback [p. 327]
- Arnout Vandecappelle, Miguel Miranda, Erik Brockmeyer, Francky Catthoor,
Diederik Verkest
Successful exploration of system-level design decisions
is impossible without fast and accurate estimation
of the impact on the system cost. In most
multimedia applications, the dominant cost factor
is related to the organization of the memory architecture.
This paper presents a systematic approach
which allows effective system-level exploration of
memory organization design alternatives, based on
accurate feedback by using our earlier developed
tools. The effectiveness of this approach is illustrated
on an industrial application. Applying our
approach, a substantial part of the design search
space has been explored in a very short time, resulting
in a cost-efficient solution which meets all design
constraints.
-
19.3 Implementation of a Scalable MPEG-4 Wavelet-Based Visual Texture
Compression System [p. 333]
- L. Nachtergaele, B. Vanhoof, M. Peón, G. Lafruit,
J. Bormans, I. Bolsens
The realization of new MPEG-4 functionality, applicable to 3D graphics
texture compression and image database access over the Internet, is
demonstrated in a PC-based compression system. Applying our system-level
design methodologies effectively removes all implementation bottlenecks.
A first-of-a-kind ASIC, called Ozone, accelerates the Embedded Zero Tree based
encoding and is capable of compressing 30 color CIF images per second.
-
19.4 A 10Mbit/s Upstream Cable Modem with Automatic Equalization [p. 337]
- Patrick Schaumont, Radim Cmar, Serge Vernalde, Marc Engels
A fully digital QAM16 burst receiver ASIC is presented.
The BO4 receiver demodulates at 10 Mbit/s and uses an
advanced signal processing architecture that performs per
burst automatic equalization. It is a critical building block
in a broadband access system for HFC networks. The chip
was designed using a C++ based flow and is implemented
as a 80 Kgate 0.7u CMOS standard cell design.
Chair: Kurt Keutzer
Organizers: Emil Girzcyc
Panel Members: Kurt Wolf, David Pietromonaco, Jay Maxey, Jeff Lewis,
Martin Lefebvre, Jeff Burns [p. 341]
-
Cell libraries determine the final density, performance, and power of most
IC designs much as the construction materials determine the quality of a
building. Nevertheless, the importance of libraries has often been a tertiary
consideration in design projects - falling behind both design skill and tool
quality. Choosing the right cell library for your project can have a
significant impact on the characteristics of the circuit you design, and thus,
the success of your product. Design teams need to consider a host of technical
and business factors when selecting a library. Technical considerations include
density, speed, power, design for reliability, and support for the designer's
tools and flow. Business considerations include price, risk, time to market,
and control of one's own destiny. This panel examines technical, as well as
current business issues, associated with cell libraries. On the technical front,
the advantages and disadvantages of static libraries versus ``on the fly" or
dynamic libraries will be discussed and quantified. On the business front,
while designers have traditionally used the cell libraries provided by their
silicon source (internal division or semiconductor vendor), recent
changes in technology and business practices make several cell-library sources
available to design groups: silicon vendors, third party library vendors, and
internally created. This panel will explore the business issues associated with
the library choice and debate when designers should use each available source
of cell libraries.
Chair: Chung-kuan Cheng
Organizers: Sharad Malik, Andrew Kahng
-
21.1 Multilevel k-way Hypergraph Partitioning [p. 343]
- George Karypis, Vipin Kumar
In this paper, we present a new multilevel k-way hypergraph partitioning
algorithm that substantially outperforms the existing state-of-the-art
K-PM/LR algorithm for multi-way partitioning. both for
optimizing local as well as global objectives. Experiments on the
ISPD98 benchmark suite show that the partitionings produced by
our scheme are on the average 15% to 23% better than those produced
by the K-PM/LR algorithm, both in terms of the hyperedge
cut as well as the (Kmetric. Furthermore, our algorithm is significantly
faster, requiring 4 to 5 times less time than that required
by K-PM/LR.
-
21.2 Hypergraph Partitioning for VLSI CAD: Methodology for Heuristic
Development, Experimentation and Reporting [p. 349]
- Andrew E. Caldwell, Andrew B. Kahng, Andrew Kennings, Igor L. Markov
We illustrate how technical contributions in the VLSI CAD partitioning
literature can fail to provide one or more of: (i) reproducible
results and descriptions, (ii) an enabling account of the key understanding
or insight behind a given contribution, and (iii) experimental
evidence that is not only contrasted with the state-of-the-art, but
also meaningful in light of the driving application. Such failings
can lead to reporting of spurious and misguided conclusions. For
example, new ideas may appear promising in the context of a weak
experimental testbed, but in reality do not advance the state of the
art. The resulting inefficiencies can be detrimental to the entire research
community. We draw on several models (chiefly from the
metaheuristics community) [5] for experimental research and reporting
in the area of heuristics for hard problems, and suggest that
such practices can be adopted within the VLSI CAD community.
Our focus is on hypergraph partitioning.
-
21.3 Hypergraph Partitioning With Fixed Vertices [p. 355]
- Andrew E. Caldwell, Andrew B. Kahng, Igor L. Markov
We empirically assess the implications of fixed terminals for hypergraph
partitioning heuristics. Our experimental testbed incorporates
a leading-edge multilevel hypergraph partitioner [14] [3]
and IBM-internal circuits that have recently been released as part
of the ISPD-98 Benchmark Suite [2, 1]. We find that the presence
of fixed terminals can make a partitioning instance considerably
easier (possibly to the point of being "trivial"): much less
effort is needed to stably reach solution qualities that are near
best-achievable.
Toward development of partitioning heuristics specific
to the fixed-terminals regime, we study the pass statistics of flat
FM-based partitioning heuristics. Our data suggest that with more
fixed terminals, the improvements in a pass are more likely to occur
near the beginning of the pass. Restricting the length of passes
- which degrades solution quality in the classic (free-hypergraph)
context - is relatively safe for the fixed-terminals regime and considerably
reduces run time of our FM-based heuristic implementations.
We believe that the distinct nature of partitioning in the
fixed-terminals regime has deep implications (i) for the design and
use of partitioners in top-down placement, (ii) for the context in
which VLSI hypergraph partitioning research is pursued, and (iii)
for the development of new benchmark instances for the research
community.
-
21.4 Relaxation and Clustering in a Local Search Framework: Application to
Linear Placement [p. 360]
- Sung-Woo Hur, John Lillis
This paper presents two primary results relevant to physical
design problems in CAD/VLSI through a case study of the
linear placement problem. First a local search mechanism
which incorporates a neighborhood operator based on constraint
relaxation is proposed. The strategy exhibits many
of the desirable features of analytical placement while retaining
the flexibility and non-determinism of local search.
The second and orthogonal contribution is in netlist clustering.
We characterize local optima in the linear placement
problem through a simple visualization tool { the displacement
graph. This characterization reveals the relationship
between clusters and local optima and motivates a dynamic
clustering scheme designed specifically for escaping such local
optima. Promising experimental results are reported.
Chair: Massoud Pedram
Organizers: M. Marek-Sadowska, Jason Cong
-
22.1 An alpha-approximate Algorithm for Delay-Constraint Technology Mapping [p. 367]
- Sumit Roy, Krishna Belkhale, Prithviraj Banerjee
Variants of delay-cost functions have been used in a class
of technology mapping algorithms [1, 2, 3, 4]. We illustrate
that in an industrial environment the delay-cost
function can grow unboundedly and lead to very large run-times.
The key contribution of this work is a novel bounded
compression algorithm. We introduce a concept of alpha delay-cost
curve, (alpha-DC-curve) that requires up to exponentially
less delay-cost points to be stored compared to that stored
by the delay function. We prove that the solution obtained
by this exponential compaction of the delay-function is
bounded to alpha% of the optimal solution. We also suggest
a large set of CAD applications which may benefit from
using alpha-DC-curve. Finally, we demonstrate the effectiveness
of our compaction scheme on one such application,
namely technology mapping for low power. Experimental
results on industrial environment show that we are more
than 17 times faster than [2] on certain MCNC circuit.
-
22.2 Technology Mapping for FPGAs with Nonuniform Pin Delays and Fast
Interconnections [p. 373]
- Jason Cong, Yean-Yow Hwang, Songjie Xu
In this paper we study the technology mapping problem for FPGAs with
nonuniform pin delays and fast interconnects. We develop the PinMap
algorithm to compute the delay optimal mapping solution for
FPGAs with nonuniform pin delays in polynomial time based on the
efficient cut enumeration. Compared with FlowMap [5] without considering
the nonuniform pin delays, PinMap is able to reduce the circuit delay
by 15% without any area penalty. For mapping with fast interconnects,
we present two algorithms, an iterative refinement based algorithm,
named ChainMap, and a Boolean matching based algorithm, named HeteroBM,
which combines the Boolean matching techniques proposed in [2] and [3] and
the heterogeneous technology mapping mechanism presented in [1]. It is
shown that both ChainMap and HeteroBM are able to significantly reduce
the circuit delay by making efficient use of the FPGA fast interconnects
resources.
-
22.3 Automated Phase Assignment for the Synthesis of Low Power Domino
Circuits [p. 379]
- Priyadarshan Patra, Unni Narayanan
High performance circuit techniques such as domino logic have migrated
from the microprocessor world into more mainstream ASIC
designs. The problem is that domino logic comes at a heavy cost
in terms of total power dissipation. For mobile and portable devices
such as laptops and cellular phones, a high power dissipation
is an unacceptable price to pay for high performance. Hence, we
study synthesis techniques that allow designers to take advantage
of the speed of domino circuits while at the same time to minimize
total power consumption. Specifically, in this paper we present
three results related to automated phase assignment for the synthesis
of low power domino circuits: (1) We demonstrate that the
choice of phase assignment at the primary outputs of a circuit can
significantly impact power dissipation in the domino block (2) We
propose a method for efficiently estimating power dissipation in a
domino circuit and (3) We apply the method to determine a phase
assignment that minimizes power consumption in the final circuit
implementation. Preliminary experimental results on a mixture of
public domain benchmarks and real industry circuits show potential
power savings as high as 34% over the minimum area realization
of the logic. Furthermore, the low power synthesized circuits still
meet timing constraints.
Chair: Kunle Olukotun
Organizers: Kunle Olukotun, Andreas Kuehlmann
-
23.1 Enhancing Simulation with BDDs and ATPG [p. 385]
- Malay K. Ganai, Adnan Aziz, Andreas Kuehlmann
We introduce SImulation Verification with Augmentation
(SIVA), a tool for checking safety properties on digital hardware
designs. SIVA integrates simulation with symbolic techniques
for vector generation. Specifically, the core algorithm
uses a combination of ATPG and BDDs to generate input
vectors which cover behavior not excited by simulation. Experimental
results demonstrate considerable improvement in
state space coverage compared with either simulation or formal
verification in isolation.
Keywords: Formal verification, ATPG, simulation, BDDs,
coverage.
-
23.2 Cycle-Based Symbolic Simulation of Gate-Level Synchronous Circuits [p. 391]
- Valeria Bertacco, Maurizio Damiani, Stefano Quer
Symbolic methods are often considered the state-of-the-art
technique for validating digital circuits. Due to their complexity
and unpredictable run-time behavior, however, their
potential is currently limited to small-to-medium circuits.
Logic simulation privileges capacity, it is nicely scalable,
flexible, and it has a predictable run-time behavior. For this
reason, it is the common choice for validating large circuits.
Simulation, however, typically visits only a small fraction of
the state space: The discovery of bugs heavily relies on the
expertise of the designer of the test stimuli.
In this paper we consider a symbolic simulation approach
to the validation problem. Our objective is to trade-off between
formal and numerical methods in order to simulate a
circuit with a "very large number" of input combinations and
sequences in parallel. We demonstrate larger capacity with
respect to symbolic techniques and better efficiency with respect
to cycle-based simulation. We show that it is possible
to symbolically simulate very large trace sets in parallel
(over 100 symbolic inputs) for the largest ISCAS benchmark
circuits, using 96 Mbytes of memory.
-
23.3 Exploiting Positive Equality and Partial Non-Consistency in the Formal
Verification of Pipelined Microprocessors [p. 397]
- Miroslav N. Velev, Randal E. Bryant
We study the applicability of the logic of Positive Equality
with Uninterpreted Functions (PEUF) [2][3] to the verification
of pipelined microprocessors with very large Instruction Set
Architectures (ISAs). Abstraction of memory arrays and functional
units is employed, while the control logic of the processors
is kept intact from the original gate-level designs. PEUF is an
extension of the logic of Equality with Uninterpreted Functions,
introduced by Burch and Dill [4], that allows us to use distinct
constants for the data operands and instruction addresses
needed in the symbolic expression for the correctness criterion.
We present several techniques that make PEUF scale very efficiently
for the verification of pipelined microprocessors with
large ISAs. These techniques are based on allowing a limited
form of non-consistency in the uninterpreted functions, representing
initial memory state and ALU behaviors. Our tool
required less than 30 seconds of CPU time and 5 MB of memory
to verify a 5-stage MIPS-like pipelined processor that implements
191 instructions of various classes. The verification was
done by correspondence checking - a formal method, where a
pipelined microprocessor is compared against a non-pipelined
specification.
-
23.4 Formal Verification Using Parametric Representations of Boolean
Constraints [p. 402]
- Mark D. Aagaard, Robert B. Jones, Carl-Johan H. Seger
We describe the use of parametric representations of
Boolean predicates to encode data-space constraints and significantly
extend the capacity of formal verification. The constraints
are used to decompose verifications by sets of case splits and to
restrict verifications by validity conditions. Our technique is applicable
to any symbolic simulator. We illustrate our technique on
state-of-the-art Intel (R) designs, without removing latches or modifying
the circuits in any way.
Chair: Ivo Bolsens
Organizers: Asawaree Kalavade, Kenji Yoshida
-
24.1 Vertical Benchmarks for CAD [p. 408]
- Christopher Inacio, Herman Schmit, David Nagle, Andrew Ryan, Donald E.
Thomas, Yingfai Tong, Ben Klass
Vertical benchmarks are complex system
designs represented at multiple levels of
abstraction. More effective than component-based
CAD benchmarks, vertical benchmarks
enable quantitative comparison of CAD techniques
within or across design flows. This work
describes the notion of vertical benchmarks
and presents our benchmark, which is based on
a commercial DSP, by comparing two alternative
design flows.
-
24.2 A Framework for User Assisted Design Space Exploration [p. 414]
- X. Hu, G. W. Greenwood, S. Ravichandran, G. Quan
Much effort in hardware/software co-design has been devoted
to developing "push-button" types of tools for automatic
hardware/software partitioning. However, given the
highly complex nature of embedded system design, user
guided design exploration can be more effective. In this paper
, we propose a framework for designer assisted partitioning
that can be used in conjunction with any given search
strategy. A key component of this framework is the visualization
of the design space, without enumerating all possible
design configurations. Furthermore, this design space representation
provides a straightforward way for a designer to
identify promising partitions and hence guide the subsequent
exploration process. Experiments have shown the effectiveness
of this approach.
-
24.3 Fast Prototyping: a System Design Flow Applied to a Complex System-On-Chip
Multiprocessor Design [p. 420]
- Benoit Clement, Richard Hersemeule, Etienne Lantreibecq, Bernard
Ramanadin, Pierre Coulomb, Francois Pogodalla
This paper describes a new design flow that significantly
reduces time-to-market for highly
complex multiprocessor-based System-On-Chip
designs. This flow, called Fast Prototyping,
enables concurrent hardware and software
development, early verification and productive
re-use of intellectual property. We describe how
using this innovative system design flow, that
combines different technologies, such as C
modeling, emulation, hard Virtual Component
re-use and CoWare N2CTM, we achieve better
productivity on a multi-processor SOC design.
1.1 Keywords:
System design, Hardware/Software (HW/SW) co-design,
Virtual Component (VC) re-use, Fast Prototyping, system
verification, system modeling.
-
24.4 Verification and Management of a multimillion-Gate Embedded Core
Design [p. 425]
- Johann Notbauer, Thomas Albrecht, Georg Niedrist, Stefan Rohringer
Verification is one of the most critical and time-consuming tasks
in today's design processes. This paper demonstrates the
verification process of a 8.8 million gate design using HW-simulation
and cycle simulation-based HW/SW-coverification.
The main focuses are overall methodology, testbench
management, the verification task itself and defect management.
The chosen verification process was a real success: the quality of
the designed hard- and software was increased and furthermore
the time needed for integration and test of the design in the
context of the overall system was greatly reduced.
Chair: Paul Franzon
Organizer: Mark Basel
Panel Mambers:
Mark Basel, Aki Fujimura, Sharad Mehrotra, Ron Preston,
Robin C. Sarma, Marty Walker [p. 429]
-
The effect of parasitic elements on chip performance is well known, however
the relative importance of this effect is becoming more critical to a
chip's performance. To cope with this new design hazard there are a number
of parasitic extraction tools and methodology approaches available to the
circuit designer. Some developed for the general market and some developed
for internal use. With each product having its own claims and approaches,
deciding on a tool or extraction strategy is a confusing exercise. The
purpose of this panel is to help the designer and CAD manager determine how
to properly compare extractors and how to put them to use. The panel will
address a number of questions including; What is the best way to accurately
handle parasitic extraction while dealing with increasingly large and
complex (SOC, mixed signal) chips ? How to determine and achieve the
required extraction accuracy for a particular design situation ? How
is extraction accuracy measured? How can each extractor be compared and
contrasted with some degree of confidence ? Can circuit design techniques
and/or tool methodologies be used to reduce the extraction effort ? How
should process variations or inductive effects can handled ? What's the
best way to deal with the data volume problem ?
Chair: Andrew B. Kahng
Organizers: Andrew Kahng, Hidetoshi Onodera
-
26.1 Mixed-Vth (MVT) CMOS Circuit Design Methodology for Low Power
Applications [p. 430]
- Liqiong Wei, Zhanping Chen, Kaushik Roy, Yibin Ye, Vivek De
Dual threshold technique has been proposed to reduce leakage
power in low voltage and low power circuits by applying
a high threshold voltage to some transistors in non-critical
paths, while a low-threshold is used in critical path(s) to
maintain the performance. Mixed-Vth (MVT) static CMOS
design technique allows different thresholds within a logic
gate, thereby increasing the number of high threshold transistors
compared to the gate-level dual threshold technique.
In this paper, a methodology for MVT CMOS circuit design
is presented. Different MVT CMOS circuit schemes are considered
and three algorithms are proposed for the transistor-level
threshold assignment under performance constraints.
Results indicate that MVT CMOS design technique can provide
about 20% more leakage reduction compared to the
corresponding gate-level dual threshold technique.
-
26.2 Stand-by Power Minimization through Simultaneous Threshold Voltage
Selection and Circuit Sizing [p. 436]
- Supamas Sirichotiyakul, Tim Edwards, Chanhee Oh, Jingyan Zuo,
Abhijit Dharchoudhury, Rajendran V. Panda, David Blaauw
We present a new approach for estimation and optimization
of the average stand-by power dissipation in
large MOS digital circuits. To overcome the complexity
of state dependence in average leakage estimation,
we introduce the concept of "dominant leakage states"
and use state probabilities. Our method achieves
speed-ups of 3 to 4 orders of magnitude over exhaustive
SPICE simulations while maintaining accuracies
within 9% of SPICE. This accurate estimation is used
in a new sensitivity-based leakage and performance
optimization approach for circuits using dual Vtprocesses.
In tests on a variety of industrial circuits, this
approach was able to obtain 81-100% of the performance
achievable with all low V t transistors, but with
1/3 to 1/6 the stand-by current.
Keywords:
Low-power-design, Dual-Vt, Leakage
-
26.3 Leakage Control With Efficient Use of Transistor Stacks in Single
Threshold CMOS [p. 442]
- Mark C. Johnson, Dinesh Somasekhar, Kaushik Roy
The state dependence of leakage can be exploited to obtain
modest leakage savings in CMOS circuits. However, one can
modify circuits considering state dependence and achieve larger
savings. We identify a low leakage state and insert leakage
control transistors only where needed. Leakage levels are on the
order of 35% to 90% lower than those obtained by state
dependence alone.
-
26.4 A Practical Gate Resizing Technique Considering Glitch Reduction for Low
Power Design [p. 446]
- Masanori Hashimoto, Hidetoshi Onodera, Keikichi Tamaru
We propose a method for power optimization that considers glitch
reduction by gate sizing based on the statistical estimation of glitch
transitions. Our method reduces not only the amount of capacitive
and short-circuit power consumption but also the power dissipated
by glitches which has not been exploited previously. The effect of
our method is verified experimentally using 8 benchmark circuits
with a 0.6 um standard cell library. Our method reduces the power
dissipation from the minimum-sized circuits further by 9.8% on
average and 23.0% maximum. We also verify that our method is
effective under manufacturing variation.
-
26.5 Gradient-Based Optimization of Custom Circuits Using a Static-Timing
Formulation [p. 452]
- A. R. Conn, I. M. Elfadel, W. W. Molzen, Jr.,
P.R. O'Brien, P.N. Strenski, C. Visweswariah, C.B. Whan
This paper describes a method of optimally sizing digital circuits
on a static-timing basis. All paths through the logic
are considered simultaneously and no input patterns need
be specified by the user. The method is unique in that it is
based on gradient-based, nonlinear optimization and can accommodate
transistor-level schematics without the need for
pre-characterization. It employs efficient time-domain simulation
and gradient computation for each channel-connected
component. A large-scale, general-purpose, nonlinear optimization
package is used to solve the tuning problem. A
prototype tuner has been developed that accommodates combinational
circuits consisting of parameterized library cells.
Numerical results are presented.
Chair: Lukas Van Ginneken
Organizers: Jason Cong, M. Marek-Sadowska
-
27.1 Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization [p. 460]
- Jason Cong, Honching Li, Chang Wu
Partitioning and clustering are crucial steps in circuit layout
for handling large scale designs enabled by the deep submicron
technologies. Retiming is an important sequential logic
optimization technique for reducing the clock period by optimally
repositioning ip ops [7]. In our exploration of a
logical and physical co-design flow, we developed a highly efficient
algorithm on combining retiming with circuit partitioning
or clustering for clock period minimization. Compared with the
recent result by Pan et al. [10] on quasi-optimal clustering with
retiming, our algorithm is able to reduce both runtime and memory
requirement by one order of magnitude without losing quality. Our
results show that our algorithm can be over 1000X faster for large designs.
-
27.2 Wave Steering in YADDs: A Novel Non-Iterative Synthesis and Layout
Technique [p. 466]
- Arindam Mukherjee, Ranganathan Sudhakar, Malgorzata Marek-Sadowska,
Stephen I. Long
In this paper we present a new synthesis and layout
approach that avoids the normal iterations between synthesis,
technology mapping and layout, and increases routing by abutment.
It produces shorter and more predictable delays, and
sometimes even layouts with reduced areas. This scheme
equalizes delays along different paths, which makes low granularity
pipelining a reality, and hence we can clock these circuits
at much higher frequencies, compared to what is possible in a
conventionally designed circuit. Since any circuit can be
clocked at a fixed rate, this method does not require timing-driven
synthesis. We propose the logic and layout synthesis
schemes and algorithms, discuss the physical layout part of the
process, and support our methodology with simulation results.
-
27.3 MERLIN: Semi-Order-Independent Hierarchical Buffered Routing Tree
Generation Using Local Neighborhood Search [p. 472]
- Amir H. Salek, Jinan Lou, Massoud Pedram
This paper presents a solution to the problem of
performance-driven buffered routing tree generation in electronic
circuits. Using a novel bottom-up construction algorithm
and a local neighborhood search strategy, this method finds the
best solution of the problem in an exponential size solution sub-space
in polynomial time. The output is a hierarchical buffered
rectilinear Steiner routing tree that connects the driver of a net
to its sink nodes. The two variants of the problem, i.e. maximizing
the driver required time subject to a total buffer area constraint
and minimizing the total buffer area subject to a
minimum driver required time constraint, are handled by propagating
three-dimensional solution curves during the construction
phase. Experimental results prove the effectiveness of this
technique compared to the other solutions for this problem.
-
27.4 Buffer Insertion With Accurate Gate and Interconnect Delay Computation
[p. 479]
- Charles J. Alpert, Anirudh Devgan, Stephen T. Quay
Buffer insertion has become a critical step in deep submicron
design, and several buffer insertion/sizing algorithms
have been proposed in the literature. However, most of these
methods use simplified interconnect and gate delay models.
These models may lead to inferior solutions since the optimized
objective is only an approximation for the actual
delay. We propose to integrate accurate wire and gate delay
models into Van Ginneken's buffer insertion algorithm [18]
via the propagation of moments and driving point admittances
up the routing tree. We have verified the effectiveness
of our approach on an industry design.
Chair: Mojy C. Chian
Organizers: Anantha Chandrakasan, Tadahiro Kuroda
-
28.1 Reducing Cross-Coupling Among Interconnect Wires in Deep-Submicron
Datapath Design [p. 485]
- Joon-Seo Yim, Chong-Min Kyung
As the CMOS technology enters the deep submicron design
era, the lateral inter-wire coupling capacitance becomes the
dominant part of load capacitance and makes RC delay on
the bus structures very data-dependent. Reducing the cross-coupling
capacitance is crucial for achieving high-speed as
well as lower power operation. In this paper, we propose two
interconnect layout design methodologies for minimizing the
coupling effect" in the design of full-custom datapath.
Firstly, we describe the control signal ordering scheme which
was shown to minimize the switching power consumption by
10% and wire delay by 15% for a given set of benchmark
examples. Secondly, a track assignment algorithm based on
evolutionary programming was used to minimize the cross-coupling
capacitance. Experimental results have shown that
the chip performance improvement as much as 40% can be
obtained using the proposed interconnect schemes in various
stages of the datapath layout optimization.
-
28.2 A Novel VLSI Layout Fabric for Deep Sub-Micron Applications [p. 491]
- Sunil P. Khatri, Amit Mehrotra, Robert K. Brayton, Alberto
Sangiovanni-Vincentelli, Ralph H.J.M. Otten
We propose a new VLSI layout methodology which addresses the
main problems faced in Deep Sub-Micron (DSM) integrated circuit
design. Our layout "fabric" scheme eliminates the conventional notion
of power and ground routing on the integrated circuit die. Instead,
power and ground are essentially "pre-routed" all over the die.
By a clever arrangement of power/ground and signal pins, we almost
completely eliminate the capacitive effects between signal wires. Additionally,
we get a power and ground distribution network with a
very low resistance at any point on the die. Another advantage of
our scheme is that the arrangement of conductors ensures that on-chip
inductances are uniformly negligible. Finally, characterization
of the circuit delays, capacitances and resistances becomes extremely
simple in our scheme, and needs to be done only once for a design.
We show how the uniform parasitics of our fabric give rise to a
reliable and predictable design. We have implemented our scheme
using public domain layout software. Preliminary results show that
it holds much promise as the layout methodology of choice in DSM
integrated circuit design.
-
28.3 Improved Delay Prediction for On-Chip Buses [p. 497]
- Real G. Pomerleau, Paul D. Frazon, Griff L. Bilbro
In this paper, we introduce a simple procedure to predict wiring
delay in bi-directional buses and a way of properly sizing the
driver for each of its port. In addition, we propose a simple
calibration procedure to improve its delay prediction over the
Elmore delay of the RC tree. The technique is fast, accurate, and
ideal for implementation in floorplanner during behavioral
synthesis.
Keywords:
RC wiring delay, High-Level Synthesis, Floorplanning, Buffer
Optimization, Interconnect optimization.
-
28.4 Noise-Aware Repeater Insertion and Wire Sizing for On-Chip Interconnect
Using Hierarchical Moment-Matching [p. 502]
- Chung-Ping Chen, Noel Menezes
Recently, several algorithms for interconnect optimization via
repeater insertion and wire sizing have appeared based on the
Elmore delay model. Using the Devgan noise metric [6] a noise-aware
repeater insertion technique has also been proposed
recently. Recognizing the conservatism of these delay and noise
models, we propose a moment-matching based technique to interconnect
optimization that allows for much higher accuracy while
preserving the hierarchical nature of Elmore-delay-based techniques.
We also present a novel approach to noise computation
that accurately captures the effect of several attackers in linear
time with respect to the number of attackers and wire segments.
Our practical experiments with industrial nets indicate that the
corresponding reduction in error afforded by these more accurate
models justifies this increase in runtime for aggressive designs
which is our targeted domain. Our algorithm yields delay and
noise estimates within 5% of circuit simulation results.
-
28.5 Interconnect Estimation and Planning for Deep Submicron Designs [p. 507]
- Jason Cong, David Zhigang Pan
This paper reports two sets of important results in our exploration
of an interconnect-centric design flow for deep submicron
(DSM) designs: (i) We obtain efficient yet accurate wiring area
estimation models for optimal wire sizing (OWS). We also propose
a simple metric to guide area-efficient performance optimization;
(ii) Guided by our interconnect estimation models, we
study the interconnect architecture planning problem for wire-width
designs. We achieve a rather surprising result which suggests
that two pre-determined wire widths per metal layer are
sufficient to achieve near-optimal performance. This result will
greatly simplify the routing architecture and tools for DSM designs.
We believe that our interconnect estimation and planning
results will have a significant impact on DSM designs.
Chair: Asawaree Kalavade
Organizers: Kenji Yoshida, Ivo Bolsens
-
29.1 ECL: A Specification Environment for System-Level Design [p. 511]
- Luciano Lavagno, Ellen, M. Sentovich
We propose a new specification environment for system-level design
called ECL. It combines the Esterel and C languages to provide
a more versatile means for specifying heterogeneous designs. It can
be viewed as the addition to C of explicit constructs from Esterel for
waiting, concurrency and pre-emption, and thus makes these operations
easier to specify and more apparent. An ECL specification
is compiled into a reactive part (an extended finite state machine
representing most of the ECL program), and a pure data looping
part, thus nicely supporting a mix of control and data. The reactive
part can be robustly estimated and synthesized to hardware or
software, while the data looping part is implemented in software as
specified.
-
29.2 Representation of Function Variants for Embedded System Optimization and
Synthesis [p. 517]
- K. Richter, D. Ziegenbein, R. Ernst, L. Thiele, J. Teich
Many embedded systems are implemented with a set of alternative
function variants to adapt the system to different applications or
environments. This paper proposes a novel approach for the coherent
representation and selection of function variants in the different
phases of the design process. In this context, the modeling of re-configuration
of system parts is supported in a natural way. Using
a real example from the video processing domain, the approach is
explained and validated.
-
29.3 Vex - A CAD Toolbox [p. 523]
- Jules P. Bergmann, Mark A. Horowitz
The increasing size and complexity of designs is making the use of
hardware description languages (HDLs), such as Verilog and
VHDL, more prevalent. They are able to describe both the initial
design and intermediate representations of the design as it is readied
for fabrication. For large designs, there inevitably are problems
with the tool flow that require custom tools to be created. These
tools must be able to access and modify the HDL for the design,
requirements that often dwarf the tools’ actual functionality, making
them difficult to create without a large effort or cutting corners.
During the FLASH project at Stanford we created Vex -- a toolbox
of components for dealing with Verilog, tied together with an interactive
scripting language -- that simplifies the creation of these
tools. It was used to create a number of tools that were critical to
our design's tape-out and has also been useful in creating design
exploration and research tools.
-
29.4 Constraint Management for Collaborative Electronic Design [p. 529]
- Juan Antonio Carballo, Stephen W. Director
Today's complex design processes feature large numbers of
varied, interdependent constraints, which often cross
interdisciplinary boundaries. Therefore, a computer-supported
constraint management methodology that
automatically detects violations early in the design process,
provides useful violation notification to guide redesign efforts,
and can be integrated with conventional CAD software can be
a great aid to the designer. We present such a methodology and
describe its implementation in the Minerva II design process
manager, along with an example design session.
Chair: Kris Pister
Organizers: Takahide Inoue, Kris Pister
Panel Members:
Albert P. Pisano, Nicholas Swart, Mike Horton, John Rychcik,
John R. Gilbert, Gerry K. Fedder [p. 535]
-
Existing MEMS products boast more than a million electrical and mechanical
components on a single chip. With several billion dollars in sales in 1997
and exponential growth it is clear that MEMS fabrication technology has
leveraged decades of IC expertise to great advantage. As a result, the
fabrication capabilities far outstrip the design capabilities in both industry
and university environments. MEMS CAD tools are only now beginning to leverage
corresponding decades of IC CAD expertise to address the exciting and unique
electro-mechanical co-design problems from the physical through system level
design. Perspectives represented in this panel include the industry needs at
the system and device levels, tool developers at all levels, and the government
research vision.
Questions to be addressed include:
- How do the current tools help or impede the development of new products ?
- What are the breakthrough markets for MEMS and how will they challenge the
existing CAD tools ?
- What are the challenges for the next generation?
Chair: David D. Ling
Organizers: Joel Phillips, Louis Scheffer
-
31.1 A Multiscale Method for Fast Capacitance Extraction [p. 537]
- Johannes Tausch, Jacob White
The many levels of metal used in aggressive deep submicron process
technologies has made fast and accurate capacitance extraction
of complicated 3-D geometries of conductors essential, and
many novel approaches have been recently developed. In this paper
we present an accelerated boundary-element method, like the well-known
FASTCAP program, but instead of using an adaptive fast
multipole algorithm we use a numerically generated multiscale basis
for constructing a sparse representation of the dense boundary-element
matrix. Results are presented to demonstrate that the multiscale
method can be applied to complicated geometries, generates
a sparser boundary-element matrix than the adaptive fast multipole
method, and provides an inexpensive but effective preconditioner.
Examples are used to show that the better sparsification and the effective
preconditioner yield a method that can be 25 times faster
than FASTCAP while still maintain accuracy in the smallest coupling
capacitances.
-
31.2 Efficient Capacitance Computation for Structures with Non-Uniform Adaptive Surface Meshes [p. 543]
- Vikram Jandhyala, Scott Savage, Eric Bracken, Zoltan Cendes
Circuit parasitic extraction problems are typically formulated
using discretized integral equations that use basis
functions defined over tesselated surface meshes. The
Fast Multipole Method (FMM) accelerates the solution process
by rapidly evaluating potentials and fields due to these
basis functions. Unfortunately, the FMM suffers from the
drawback that its efficiency degrades if the surface mesh
has disparately-sized elements in close proximity to each
other. Closely-spaced non-uniformly sized elements can appear
in realistic situations for a variety of reasons: owing to
mesh refinement, due to accurate modeling requirements for
fine structural features, and because of the presence of thin
doubly-walled structures. In this paper, modifications to
the standard multilevel FMM are presented that permit efficient
potential and field evaluation over specific non-uniform meshes.
The efficiency of the new technique is demonstrated
through examples involving large surface meshes with non-uniformly
sized elements in close proximity.
-
31.3 Substrate Modeling and Lumped Substrate Resistance Extraction for CMOS
ESD/Latchup Circuit Simulation [p. 549]
- Tong Li, Ching-Han Tsai, Elyse Rosenbaum, Sung-Mo Kang
Due to interactions through the common silicon substrate,
the layout and placement of devices and substrate contacts
can have significant impacts on a circuit's ESD (Electrostatic
Discharge) and latchup behavior in CMOS technologies.
Proper substrate modeling is thus required for circuit-level
simulation to predict the circuit's ESD performance and latchup
immunity. In this work we propose a new substrate resistance
network model, and develop a
novel substrate resistance extraction method that accurately
calculates the distribution of injection current into
the substrate during ESD or latchup events. With the proposed
substrate model and resistance extraction, we can
capture the three-dimensional layout parasitics in the circuit
as well as the vertical substrate doping profile, and
simulate these effects on circuit behavior at the circuit-level
accurately. The usefulness of this work for layout optimization
is demonstrated with an industrial circuit example.
Chair: Sunil D. Sherlekar
Organizers: Sunil Sherlekar, Farid Najm
-
32.1 Dynamic Power Management Based On Continuous-Time Markov Decision
Processes [p. 555]
- Qinru Qiu, Massoud Pedram
This paper introduces a continuous-time, controllable
Markov process model of a power-managed system. The system
model is composed of the corresponding stochastic models of the
service queue and the service provider. The system environment is
modeled by a stochastic service request process. The problem of
dynamic power management in such a system is formulated as policy optimization
problem and solved using an efficient "policy
iteration" algorithm. Compared to previous work on dynamic
power management, our formulation allows better modeling of the
various system components, the power-managed system as whole, and its
environment. In addition it captures dependencies
between the service queue and service provider status. Finally, the
resulting power management policy is asynchronous, hence it is
more power-efficient and more useful in practice. Experimental
results demonstrate the effectiveness of our policy optimization
algorithm compared to a number of heuristic (time-out and N-policy)
algorithms.
-
32.2 Parallel Mixed-Level Power Simulation Based on Spatio-Temporal Circuit
Partitioning [p. 562]
- Mauro Chinosi, Roberto Zafalon, Carlo Guardiani
In this work we propose a technique for spatial
and temporal partitioning of a logic circuit based on the nodes
activity computed by using a simulation at an higher level of
abstraction. Only those components that are activated by a
given input vector are added to the detailed simulation netlist.
The methodology is suitable for parallel implementation on a
multi-processor environment and allows to arbitrarily switch
between fast and detailed levels of abstraction during the simulation
run. The experimental results obtained on a significant
set of benchmarks show that it is possible to obtain a considerable
reduction in both CPU time and memory occupation
together with a considerable degree of accuracy. Furthermore
the proposed technique easily fits in the existing industrial
design flows.
-
32.3 Low-Power Behavioral Synthesis Optimization Using Multiple Precision
Arithmetic [p. 568]
- Milos Ercegovac, Darko Kirovski, George Potkonjak
Many modern multimedia applications such as image and video processing
are characterized by a unique combination of arithmetic and
computational features: fixed-point arithmetic, a variety of short data
types, high degree of instruction-level parallelism, strict
timing constraints, and high computational requirements. Computationally
intensive algorithms usually boost device's power dissipation which is
often key to the efficiency of many communications and multimedia
applications. Although recently virtually all general-purpose processors
have been equipped with multiprecision operations, the current generation
of behavioral synthesis tools for application-specific systems does not
utilize this power/performance optimization paradigm.
In this paper, we explore the potential of using multiple precision
arithmetic units to effectively support synthesis of low-power
application-specific integrated circuits. We propose a new architectural
scheme for collaborate addition of sets of variable precision data.
We have developed a novel resource allocation and computation assignment
methodology for a set of multiple precision arithmetic units. The
optimization algorithms explore the trade-off allocating low-width
bus structures and executing multiple-cycle operations. Experimental
results indicate strong advantages of the proposed approach.
Chair: Kenji Yoshida
Organizers: Asawaree Kalavade, Ivo Bolsens
-
33.1 A Methodology for the Verification of a 'System on Chip' [p. 574]
- Daniel Geist, Giora Biran, Tamara Arons, Michael Slavkin, Yvgeny Nustov,
Monica Farkas, Karen Holtz, Andy Long, Dave King, Steve Barret
This paper summarizes the verification effort of a
complex ASIC designated to be an "all in one"
ISDN network router. This ASIC is unique because
it actually consists of many independent components,
called "cores" (including the processor). The
integration of these components onto one chip
results in an ISOC (Integrated System On a Chip).
The complexity of verifying an ISOC is virtually
impossible without a proper methodology. This
paper presents the methodology developed for verifying
the router. In particular, the verification
method as well as the tools that were built to execute
this method are presented. Finally, a summary of
the verification results is given.
1.1 Keywords:
Systems on chip,verification, test and debugging.
-
33.2 ICEBERG: An Embedded In-Circuit Emulator Synthesizer for Microcontrollers
[p. 580]
- Ing-Jer Huang, Tai-An Lu
This paper presents a synthesis tool ICEBERG for embedded in-circuit emulators
(ICE's), that are part of the development environment for microcontroller
(or microprocessor)-based systems (PIPER-II). the tool inserts and integrates
the necessary in-circuit emulation circuitry into a given RTL core of a
microcontroller, and thus turning the core into an embedded ICE. The ICE,
based on the IEEE 1149.I JTAG architecture, provides standard debugging
mechanisms, including boundary scan paths, partial scan paths, single stepping,
internal resource monitoring and modification, breakpoint detection, and mode
switching between debugging and free running modes. ICEBERG has been
successfully applied to synthesize the embedded ICE for an industrial
microcontroller HT48100 from its RTL core.
-
33.3 Microprocessor Based Testing for Core-Based System on Chip [p. 586]
- C. A. Papachristou, F. Martin, M. Nourani
The purpose of this paper is to develop a flexible design for
test methodology for testing a core-based system on chip
(SOC). The novel feature of the approach is the use an
embedded microprocessor/memory pair to test the remaining
components of the SOC. Test data is downloaded using
DMA techniques directly into memory while the microprocessor
uses the test data to test the core. The test results
are tranferred to a MISR for evaluation. The approach has
several important advantages over conventional ATPG such
as achieving at-speed testing, not limiting the chip speed
to the tester speed during test and achieving great flexibility
since most of the testing process is based on software.
Experimental results on an example system are discussed.
Chair: Mahadevamurty Nemani
Organizers: Randy Harr, Telle Whitney
-
34.1 Using Partitioning to Help Convergence in the Standard-Cell Design
Automation Methodology [p. 592]
- Hema Kapadia, Mark A. Horowitz
This paper explores a standard-cell design methodology based on netlist
partitioning as a solution for the problem of lack of convergence in the
conventional methodology in deep submicron technologies. A synthesized
design block is partitioned along unpredictable nets that are identified
from the netlist structure. the size of each partition is restricted
so that the longest possible local net in a partition can be sufficiently
driven by an average library gate, hence allowing statistical wire-load modeling
for the local nets. The block is resynthesized using a hybrid wire-load model
that takes into account accurate wire-load information on the unpredictable
nets derived after floorplanning the partitions, and uses custom statistical
wire-load models within each partition. Final placement is restricted to
respect the initial floorplan. The methodology was implemented using existing
commercial tools for synthesis and layout. Experimental results show high
correlation between synthesis estimates and post-placement measurements of
wire-loads and gate delays with the new methodology. The trade-offs of
partitioning, current limitations of the methodology and future work to
overcome these limitations are also discussed.
-
34.2 Comparing RTL and Behavioral Design Methodologies in the Case of a
2M-Transistors ATM Shaper [p. 598]
- Imed Moussa, Zoltan Sugar, Rodolph Suescun, Mario Diaz-Nava, Marco
Pavesi, Salvatore Crudo, Luca Gazzi, Ahmed Amine Jerraya
This paper describes the experience and the lessons learned during
the design of an ATM traffic shaper circuit using behavioral synthesis.
The experiment is based on the comparison of the results
of two parallel design flows starting from the same specification.
The first used a classical design method based on RTL synthesis.
The second design flow is based on behavioral synthesis. The experiment
has shown that behavioral synthesis is able to produce
efficient design in terms of gate count and timing while bringing a
threefold reduction in design effort when compared to RTL design
methodology.
-
34.3 Engineering Change: Methodology and Applications to Behavioral and System
Synthesis [p. 604]
- Darko Kirovski, Miodrag Potkonjak
Due to the unavoidable need for system debugging, performance tuning,
and adaptation to new standards, the engineering change (EC) methodology
has emerged as one of the crucial components in synthesis of systems-on-chip.
We introduce a novel design methodology which facilitates design-for-EC
and post-processing to enable EC with minimal perturbation. Initially,
as a synthesis pre-processing step, the original design specification is
augmented with additional design constraints which ensure flexibility
for future correction. Upon alteration of the initial design, a novel
post-processing technique achives the desired functionality with a
near-minimal perturbation of the initially optimized design. The
key contribution we introduce is a constraint manipulation technique
which enables reduction of an arbitrary EC problem into its corresponding
classical synthesis problem. As a result, in both pre- and post-processing
for EC, classical synthesis algorithms can be used to enable flexibility
and perform the correction process. We demonstrate the developed EC
methodology on a set of behavioral and system synthesis tasks.
Chair: Telle Whitney
Organizers: Vivek Tiwari, Randy Harr
-
35.1 Reconfigurable Computing: What, Why, and Implications for Design
Automation [p. 610]
- André DeHon, John Wawrzynek
Reconfigurable Computing is emerging as an important new organizational
structure for implementing computations. It combines
the post-fabrication programmability of processors with the spatial
computational style most commonly employed in hardware designs.
The result changes traditional "hardware" and "software"
boundaries, providing an opportunity for greater computational capacity
and density within a programmable media. Reconfigurable
Computing must leverage traditional CAD technology for building
spatial designs. Beyond that, however, reprogrammablility introduces
new challenges and opportunities for automation, including
binding-time and specialization optimizations, regularity extraction
and exploitation, and temporal partitioning and scheduling.
-
35.2 An Automated Temporal Partitioning and Loop Fission approach for FPGA
based Reconfigurable Synthesis of DSP Applications [p. 616]
- Meenakshi Kaul, Ranga Vemuri, Sriram Govindarajan, Iyad E. Ouaiss
We present an automated temporal partitioning and loop
transformation approach for developing dynamically reconfigurable
designs starting from behavior level specifications.
An Integer Linear Programming (ILP) model is formulated
to achieve near-optimal latency designs. We, also present a
loop restructuring method to achieve maximum throughput
for a class of DSP applications. This restructuring transformation
is performed on the temporally partitioned behavior
and results in near-optimization of throughput. We discuss
efficient memory mapping and address generation techniques
for the synthesis of reconfigurable designs. A Case study on
the Joint Photographic Experts Group (JPEG) image compression
algorithm demonstrates the effectiveness of our approach.
-
35.3 Dynamically Reconfigurable Architecture for Image Processor Applications
[p. 623]
- Alexandro M. S. Adário, Eduardo L. Roehe, Sergio Bampi
This work presents an overview of the principles that underlie the
speed-up achievable by dynamic hardware reconfiguration,
proposes a more precise taxonomy for the execution models for
reconfigurable platforms, and demonstrates the advantage of
dynamic reconfiguration in the new implementation of a
neighborhood image processor, called DRIP. It achieves a real-time
performance, which is 3 times faster than its pipelined non-reconfigurable
version.
Keywords:
Reconfigurable architecture, image processing, FPGA
Chair: L. Miguel Silveira
Organizers: Alan Mantooth, Hidetoshi Onodera
-
36.1 Multi-Time Simulation of Voltage-Controlled Oscillators [p. 629]
- Onuttom Narayan, Jaijeet Roychowdhury
We present a novel formulation, called the WaMPDE, for solving systems with
forced autonomous components. An important feature of the WaMPDE is its
ability to capture frequency modulation (FM) in a natural and compact manner.
This is made possible by a key new concept: that of warped time, related
to normal time through separate time scales. Using warped time, we obtain a
completely general formulation that captures complex dynamics in autonomous
nonlinear systems of arbitrary size or complexity. We present computationally
efficient numerical methods for solving large practical problems using the
WaMPDE. Our approach explicitly calculates a time-varying local frequency
that matches intuitive expectations. Applied to VCOs, WaMPDE-based simulation
results in speedups of two orders of magnitude over transient simulation.
-
36.2 Efficient Computation of Quasi-Periodic Circuit Operating Conditions via
a Mixed Frequency/Time Approach [p. 635]
- Dan Feng, Joel Phillips, Keith Nabors, Ken Kundert, Jacob White
Design of communications circuits often requires computing
steady-state responses to multiple periodic inputs of differing
frequencies. Mixed frequency-time (MFT) approaches are orders
of magnitude more efficient than transient circuit simulation,
and perform better on highly nonlinear problems than traditional
algorithms such as harmonic balance. We present algorithms for
solving the huge nonlinear equation systems the MFT approach
generates from practical circuits.
-
36.3 Time-Mapped Harmonic Balance [p. 641]
- Ognen J. Nastov, Jacob K. White
Matrix-implicit Krylov-subspace methods have made it possible to
efficiently compute the periodic steady-state of large circuits using
either the time-domain shooting-Newton method or the frequency-domain
harmonic balance method. However, the harmonic balance
methods are not so efficient at computing steady-state solutions
with rapid transitions, and the low-order integration methods
typically used with shooting-Newton methods are not so efficient
when high accuracy is required. In this paper we describe a
Time-Mapped Harmonic Balance method (TMHB), a fast Krylov-subspace
spectral method that overcomes the inefficiency of standard
harmonic balance in the case of rapid transitions. TMHB features
a non-uniform grid to resolve the sharp features in the signals.
Results on several examples demonstrate that the TMHB method
achieves several orders of magnitude improvement in accuracy
compared to the standard harmonic balance method. The TMHB
method is also several times faster than the standard harmonic balance
method in reaching identical solution accuracy.
Chair: Janusz Rajski
Organizers: Yervant Zorian
-
37.1 Test Generation for Gigahertz Processors Using an Automatic Functional
Constraint Extractor [p. 647]
- Raghuram S. Tupuri, Arun Krishnamachary, Jacob A. Abraham
As the sizes of general and special purpose processors
increase rapidly, generating high quality manufacturing
tests which can be run at native speeds is becoming a
serious problem. One solution is a novel method for
functional test generation in which a transformed module
is built manually, and which embodies functional
constraints described using virtual logic. Test generation
is then performed on the transformed module using
commercial tools and the transformed module patterns
are translated back to the processor level. However, the
technique is useful only if the virtual logic can be generated
automatically. This paper describes an automatic
functional constraint extraction algorithm and a procedure
to build the transformed module. We describe
the tool, FALCON, used to extract the functional constraints
of a given embedded module from a Verilog RTL
model. The constraint extraction for embedded modules
of benchmark processors using FALCON takes only a
few seconds. We show that this method can generate
functional patterns in a time several orders of magnitude
less than one using a conventional, at view of the circuit.
-
37.2 PROPTEST: A Property Based Test Pattern Generator for Sequential Circuits
Using Test Compaction [p. 653]
- Ruifeng Guo, Sudhakar M. Reddy, Irith Pomeranz
We describe a property based test generation
procedure that uses static compaction to generate
test sequences that achieve high fault coverages at
a low computational complexity. A class of test
compaction procedures are proposed and used in
the property based test generator. Experimental
results indicate that these compaction procedures
can be used to implement the proposed test generator
to achieve high fault coverage with relatively
smaller run times.
-
37.3 Multiple Error Diagnosis Based on Xlists [p. 660]
- Vamsi Boppana, Rajarshi Mukherjee, Jawahar Jain, Masahiro Fujita,
Pradeep Bollineni
In this paper, we present multiple error diagnosis algorithms
to overcome two significant problems associated with current
error diagnosis techniques targeting large circuits: their
use of limited error models and a lack of solutions that scale
well for multiple errors. Our solution is based on a non-enumerative
analysis technique, based on logic simulation
(3-valued and symbolic), for simultaneously analyzing all
possible errors at sets of nodes in the circuit. Error models
are introduced in order to address the "locality" aspect of
error location and to identify sets of nodes that are "local"
with respect to each other. Theoretical results are provided
to guarantee the diagnosis of modeled errors and robust diagnosis
approaches are shown to address the cases when errors
do not correspond to the modeled types. Experimental
results on benchmark circuits demonstrate accurate and extremely
rapid location of errors of large multiplicity.
Chair: Luciano Lavagno
Organizers: Niel Weste, Teresa Meng
-
38.1 Simulation Vector Generation from HDL Descriptions for
Observability-Enhanced-Statement Coverage [p. 666]
- Farzan Fallah, Pranav Ashar, Srinivas Devadas
Validation of RTL circuits remains the primary bottleneck in improving
design turnaround time, and simulation remains the primary
methodology for validation. Simulation-based validation has suffered
from a disconnect between the metrics used to measure the error
coverage of a set of simulation vectors, and the vector generation
process. This disconnect has resulted in the simulation of virtually
endless streams of vectors which achieve enhanced error coverage
only infrequently. Another drawback has been that most error coverage
metrics proposed have either been too simplistic or too inefficient
to compute. Recently, an effective observability-based statement
coverage metric was proposed along with a fast companion
procedure for evaluating it.
The contribution of our work is the development of a vector generation
procedure targeting the observability-based statement coverage
metric. Our method uses repeated coverage computation to
minimize the number of vectors generated. For vector generation,
we propose a novel technique to set up constraints based on the chosen
coverage metric. Once the system of interacting arithmetic and
Boolean constraints has been set up, it can be solved using hybrid
linear programming and Boolean satisfiability methods. We present
heuristics to control the size of the constraint system that needs to be
solved. We present experimental results which show the viability of
automatically generating vectors using our approach for industrial
RTL circuits. We envision our system being used during the design
process, as well as during post-design debugging.
-
38.2 A Two-State Methodology for RTL Logic Simulation [p. 672]
- Lionel Bening
This paper describes a two-state methodology for register
transfer level (RTL) logic simulation in which the use of the X-state
is completely eliminated inside ASIC designs. Examples
are presented to show the gross pessimism and optimism that
occurs with the X in RTL simulation. Random two-state
initialization is offered as a way to detect and diagnose startup
problems in RTL simulation. Random two-state initialization (a)
is more productive than the X-state in gate-level simulation, and
(b) provides better coverage of startup problems than X-state in
RTL simulation. Consistent random initialization is applied (a)
as a way to duplicate a startup state using a slower diagnosis-oriented
simulator after a faster detection-oriented simulator
reports the problem, and (b) to verify that the problem is
corrected for that startup state after the design change intended to
fix the problem. In addition to combining the earlier ideas of
two-state simulation, and random initialization with consistent
values across simulations, an original technique for treatment of
tri-state Z's arriving into a two-state model is introduced.
Keywords:
RTL, simulation, 2-state, X-state, pessimism, optimism, random,
initialization.
-
38.3 An Approach for Extracting RT Timing Information to Annotate
Algorithmic VHDL Specifications [p. 678]
- Cordula Hansen, Francisco Nascimento, Wolfgang Rosenstiel
This paper presents a new approach for
extracting timing information defined in a simulation vector
set on register transfer level (RTL) and reusing them in the
behavioral specification. Using a VHDL RTL simulation
vector set and a VHDL behavioral specification as entry, the
timing information are extracted and as well as the specification
transformed in a Partial Order based Model (POM).
The POM expressing the timing information is then mapped
on the specification POM. The result contains the behavioral
specification and the RTL timing and is retransformed in a
corresponding VHDL specification. Additionally, timing
information contained in the specification can be checked
using the RTL simulation vectors.
Chair: Rajesh K. Gupta
Organizers: Telle Whitney, Vivek Tiwari
-
39.1 A Massively-Parallel Easily-Scalable Satisfiability Solver Using
Reconfigurable Hardware [p. 684]
- Miron Abramovici, Jose T. de Sousa, Daniel Saab
Satisfiability (SAT) is a computationally
expensive algorithm central to many CAD and test applications.
In this paper, we present the architecture of a
new SAT solver using reconfigurable logic. Our main
contributions include new forms of massive fine-grain
parallelism and structured design techniques based on
iterative logic arrays that reduce compilation times from
hours to a few minutes. Our architecture is easily scalable.
Our results show several orders of magnitude speed-up
compared with a state-of-the-art software implementation,
and with a prior SAT solver using reconfigurable hardware.
-
39.2 Dynamic Fault Diagnosis on Reconfigurable Hardware [p. 691]
- Fatih Kocan, Daniel G. Saab
In this paper, we introduce a new approach for locating
and diagnosing faults in combinational circuits. The
approach is based on automatically designing a circuit
which implements a closest-match fault location algorithm specialized
for the combinational circuit under diagnosis (CUD). This
approach eliminates the need for large storage required by a
software based fault diagnosis. In this paper, we show the approach's
feasibility in terms of hardware resources, speed,
and how it compares with software based techniques.
-
39.3 Hardware Compilation for FPGA-Based Configurable Computing Machines
[p. 697]
- Xiaohan Zhu, Bill Lin
Configurable computing machines are an emerging class of hybrid architectures
where a field programmable gate array (FPGA) component is tightly coupled
to a general-purpose microprocessor core. In these architectures, the
FPGA component complements the general-purpose microprocessor by enabling
a developer to construct application-specific gate-level
structures on-demand while retaining the flexibility and rapid
reconfigurability of a fully programmable solution. High computational
performance can be achieved on the FPGA component by creating custom
data paths, operators, and interconnection pathways that are
dedicated to a given problem, thus enabling similar structural optimization
benefits as ASICs. In this paper, we present a new programming environment
for the development of applications on this new class of configurable
computing machines. This environment enables developers to develop
hybrid hardware/software applications in a common integrated development
framework. In particular, the focus of this paper is on the hardware
compilation part of the problem starting from a software-like algorithmic
process-based specification.
Chair: Bryan D. Ackland
Organizers: Bryan D. Ackland
-
40.1 0.18um CMOS and Beyond [p. 703]
- D. J. Eaglesham
As we move to the 0.18mm node and beyond, the dominant
trend in device and process technology is a simple
continuation of several decades of scaling. However, some
serious challenges to straightforward scaling are on the
horizon. This paper will review the present status of
process technology and examine the likely departures from
scaling in the various areas. The 0.18mm node is seeing the
first major new materials introduced into the Si process for
many years in the interconnect, and major departures from
the traditional process are being actively considered for the
transistor. However, it is probable that continued scaling
will continue to dominate advanced processes for several
generations to come.
-
40.2 SOI Digital CMOS VLSI - A Design Perspective [p. 709]
- C. T. Chuang, R. Puri
This paper reviews the recent advances of SOI for
digital CMOS VLSI applications with particular emphasis on
the design issues and advantages resulting from the unique
SOI device structure. The technology/device requirements
and design issues/challenges for high-performance, general-purpose
microprocessor applications are differentiated with
respect to low-power portable applications. Particular emphases
are placed on the impact of floating-body in partially-depleted devices
on the circuit operation, stability, and functionality. Unique
SOI design aspects such as parasitic bipolar
effect and hysteretic V T variation are addressed. Circuit techniques
to improve the noise immunity and global design issues
are discussed.
Chair: Thomas G. Szymanski
Organizers: Kunle Olukotun, Sharak Malik
-
41.1 Equivalent Elmore Delay for RLC Trees [p. 715]
- Yehea I. Ismail, Eby G. Friedman, Jose L. Neves
Closed form solutions for the 50% delay, rise time,
overshoots, and settling time of signals in an RLC tree are
presented. These solutions have the same accuracy characteristics
as the Elmore delay model for RC trees and preserves the
simplicity and recursive characteristics of the Elmore delay. The
solutions introduced here consider all damping conditions of an
RLC circuit including the underdamped response, which is not
considered by the classical Elmore delay model due to the non-monotone
nature of the response. Also, the solutions have
significantly improved accuracy as compared to the Elmore delay
for an overdamped response. The solutions introduced here for
RLC trees can be practically used for the same purposes that the
Elmore delay is used for RC trees.
-
41.2 Effects of Inductance on the Propagation Delay and Repeater Insertion in
VLSI Circuits [p. 721]
- Yehea I. Ismail, Eby G. Friedman
A closed form expression for the propagation delay of a
CMOS gate driving a distributed RLC line is introduced that is
within 5% of dynamic circuit simulations for a wide range of
RLC loads. It is shown that the traditional quadratic dependence
of the propagation delay on the length of an RC line approaches a
linear dependence as inductance effects increase. The closed form
delay model is applied to the problem of repeater insertion in
RLC interconnect. Closed form solutions are presented for
inserting repeaters into RLC lines that are highly accurate with
respect to numerical solutions. An RC model as compared to an
RLC model creates errors of up to 30% in the total propagation
delay of a repeater system. Considering inductance in repeater
insertion is also shown to significantly save repeater area and
power consumption. The error between the RC and RLC models
increases as the gate parasitic impedances decrease which is
consistent with technology scaling trends. Thus, the importance
of inductance in high performance VLSI design methodologies
will increase as technologies scale.
-
41.3 Retiming for DSM with Area-Delay Trade-offs and Delay Constraints [p. 725]
- Abdallah Tabbara, Robert K. Brayton, A. Richard Newton
The concept of improving the timing behavior of a circuit by
relocating registers is called retiming and was first presented
by Leiserson and Saxe. They showed that the problem of
determining an equivalent minimum area (total number of
registers) circuit is polynomial-time solvable. In this work
we show how this approach can be reapplied in the DSM
domain when area-delay trade-offs and delay constraints are
considered. The main result is that the concavity of the trade-off
function allows for a casting of this DSM problem into a
classical minimum area retiming problem whose solution is
polynomial time solvable.
-
41.4 Functional Timing Analysis for IP Characterization [p. 731]
- Hakan Yalcin, Mohammad Mortazavi, Robert Palermo, Cyrus Bamji,
Karem Sakallah
A method that characterizes the timing of Intellectual
Property (IP) blocks while taking into account IP
functionality is presented. IP blocks are assumed to
have multiple modes of operation specified by the
user. For each mode, our method calculates IO path
delays and timing constraints to generate a timing
model. The method thus captures the mode-dependent
variation in IP delays which, according to our
experiments, can be as high as 90%. The special manner
in which delay calculation is performed guarantees
that IP delays are never underestimated. The
resulting timing models are also compacted through a
process whose accuracy is controlled by the user.
1.1 Keywords:
Timing analysis, false path, functional (mode) dependency,
IP characterization.
-
41.5 Detecting False Timing Paths: Experiments on PowerPCTM
Microprocessors [p. 737]
- Richard Raimi, Jacob Abraham
We present a new algorithm for detecting both combinationally and
sequentially false timing paths, one in which the constraints on a
timing path are captured by justifying symbolic functions across
latch boundaries.
We have implemented the algorithm and we present, here, the
results of using it to detect false timing paths on a recent PowerPC
microprocessor design. We believe these are the first published results
showing the extent of the false path problem in industry. Our
results suggest that the reporting of false paths may be compromising
the effectiveness of static timing analysis.
Chair: Yervant Zorian
Organizers: Janusz Rajski
-
42.1 On ILP Formulations for Built-In Self-Testable Data Path Synthesis
[p. 742]
- Han Bin Kim, Dong Sam Ha, Takeshi Takahashi
In this paper, we present a new method to the built-in self-testable
data path synthesis based on integer linear programming
(ILP). Our method performs system register assignment, built-self-test
(BIST) register assignment, and interconnection
assignment concurrently to yield optimal designs. Our
experimental results show that our method successfully
synthesizes BIST circuits for all six circuits experimented. All
the BIST circuits are better in area overhead than those
generated by existing high-level BIST synthesis methods.
Keywords:
high-level BIST synthesis, built-in self-test, BIST, ILP.
-
42.2 Improving The Test Quality for Scan-Based BIST Using A General Test
Application Scheme [p. 748]
- Huan-Chih Tsai, Kwang-Ting Cheng, Sudipta Bhawmik
In this paper, we propose a general test application
scheme for existing scan-based BIST architectures. The objective
is to further improve the test quality without inserting
additional logic to the Circuit Under Test (CUT). The proposed
test scheme divides the entire test process into multiple
test sessions. A different number of capture cycles is applied
after scanning in a test pattern in each test session to maximize
the fault detection for a distinct subset of faults. We
present a procedure to find the optimal number of capture cycles
following each scan sequence for every fault. Based on
this information, the number of test sessions and the number
of capture cycles after each scan sequence are determined
to maximize the random testability of the CUT. We conduct
experiments on ISCAS89 benchmark circuits to demonstrate
the effectiveness of our approach.
-
42.3 Built-In Test Sequence Generation for Synchronous Sequential Circuits
Based on Loading and Expansion of Test Subsequences [p. 754]
- Irith Pomeranz, Sudhakar M. Reddy
We describe an on-chip test generation scheme for synchronous
sequential circuits that allows at-speed testing of such circuits.
The proposed scheme is based on loading of (short) input
sequences into an on-chip memory, and expansion of these
sequences on-chip into test sequences. Complete coverage of
modeled faults is achieved by basing the selection of the loaded
sequences on a deterministic test sequence T0 , and ensuring that
every fault detected by T0 is detected by the expanded version of
at least one loaded sequence. Experimental results presented for
benchmark circuits show that the length of the sequence that
needs to be stored at any time is on the average 10% of the
length of T0 , and that the total length of all the loaded sequences
is on the average 46% of the length of T0.
Chair: Abhijit Dharchoudhury
Organizers: Teresa Meng, David Blaauw
-
43.1 Analysis of Performance Impact Caused by Power Supply Noise in Deep
Submicron Devices [p. 760]
- Yi-Min Jiang, Kwang-Ting Cheng
The paper addresses the problem of analyzing the
performance degradation caused by noise in power supply
lines for deep submicron CMOS devices. We first
propose a statistical modeling technique for the power
supply noise including inductive DI noise and power net
IR voltage drop. The model is then integrated with a statistical
timing analysis framework to estimate the performance
degradation caused by the power supply noise.
Experimental results of our analysis framework, validated
by HSPICE, for benchmark circuits implemented
on both 0.25 m, 2.5 V and 0.55 m, 3.3 V technologies are
presented and discussed. The results show that on average,
with the consideration of this noise effect, the circuit
critical path delays increase by 33% and 18%, respectively
for circuits implemented on these two technologies.
-
43.2 A Floorplan-Based Planning Methodology for Power and Clock Distribution
in ASICs [p. 766]
- Joon-Seo Yim, Seong-Ok Bae, Chong-Min Kyung
In deep submicron technology, IR-drop and clock skew issues
become more crucial to the functionality of chip. This
paper presents a floorplan-based power and clock distribution
methodology for ASIC design. From the floorplan and
the estimated power consumption, the power network size is
determined at an early design stage. Next, without detailed
gate-level netlist, clock interconnect sizing, the number and
strength of clock buffers are planned for balanced clock distribution.
This early planning methodology at the full-chip
level enables us to fix the global interconnect issues before
the detailed layout composition is started.
-
43.3 Digital Detection of Analog Parametric Faults in SC Filters [p. 772]
- Ramesh Harjani, Bapiraju Vinnakota
Many design for test techniques for analog circuits are ineffective
at detecting multiple parametric faults because either their accuracy
is poor, or the circuit is not tested in the configuration it is used in.
We present a DFT scheme that offers the accuracy needed to test
high-quality circuits. The DFT scheme is based on a circuit that
digitally measures the ratio of a pair of capacitors. The circuit is
used to completely characterize the transfer function of a switched
capacitor circuit, which is usually determined by capacitor ratios.
In our DFT scheme, capacitor ratios can be measured to within
0.01% accuracy, and filter parameters can be shown to be satisfied
to within 0.1% accuracy. A filter can be shown to satisfy all its
functional specifications through this characterization process. We
believe the accuracy of our scheme is at least an order of magnitude
greater than that offered by any other scheme reported in the
literature.
Chair: Han Yang
Organizers: Kenji Yoshida, David Ku
-
44.1 Application of High Level Interface-based Design to Telecommunications
System Hardware [p. 778]
- Dyson Wilkes, M.M. Kamal Hashmi
The assumption in moving system modelling to higher levels is
that this improves the design process by allowing exploration of
the architecture, providing an unambiguous specification and
catching system errors early. We used the interface-based high
level abstractions of VHDL+ in a real design, and in parallel with
the actual project to investigate the validity of these claims.
-
44.2 Hardware Reuse at the Behavioral Level [p. 784]
- Patrick Schaumont, Radim Cmar, Serge Vernalde, Marc Engels,
Ivo Bolsens
Standard interfaces for hardware reuse are currently defined
at the structural level. In contrast to this, our contribution
defines the reuse interface at the behavioral register transfer
(RT) level. This promotes direct reuse of functionality
and avoids the integration problems of structural reuse.
We present an object oriented reuse interface in C++ and
show the use of it within two real-life designs.
-
44.3 Description and Simulation of Hardware/Software Systems with Java [p. 790]
- Tommy Kuhn, Wolfgang Rosenstiel, Udo Kebschull
In this paper a newly developed object model is
presented which allows the description of hardware/
software systems in all its parts. An adaption
of the component model JavaBeans allows
to combine different kinds of reuse in one unitary
language. A model based design flow and
some tools are presented and applied to a JPEG
example.
1.1 Keywords:
Object oriented hardware modeling, simulation, codesign.
-
44.4 Java Driven Codesign and Prototyping of Networked Embedded Systems [p. 794]
- Josef Fleischmann, Klaus Buchenrieder, Rainer Kress
While the number of embedded systems in consumer
electronics is growing dramatically, several trends can
be observed which challenge traditional codesign practice:
An increasing share of functionality of such systems
is implemented in software; flexibility or reconfigurability
is added to the list of non-functional requirements.
Moreover, networked embedded systems are equipped
with communication capabilities and can be controlled
over networks. In this paper, we present a suitable methodology
and a set of tools targeting these novel requirements.
JACOP is a codesign environment based on Java
and supports specification, co-synthesis and prototyping
of networked embedded systems.
Chair: Andrew B. Kahng
Organizers: ctul Sharan, Susan Lippincott
Panel Members: Y. C. Pati, Warren Grobman, Robert Pack, Lance Glasser,
Kenneth V. Rousseau
-
In the sub 0.25 micron regime, IC feature sizes become smaller than the
wavelength of light used for silicon exposure. Resultant light
distortions create patterns on silicon that are substantially different
from a GDSII layout. Although light distortions have traditionally not
affected the design flow, the techniques used to control these distortions
have a potential impact on the design flow that is as formidable as
the recently addressed Deep Sub-Micron transition. This session will
discuss the design implications arising from techniques used to
control sub-wavelength lithography. It will begin with an embedded tutorial
on subwavelength mask design techniques and their resultant
effect on the IC design process. The panel will then debate the extent
of the resulting impact on IC performance, design flow, and CAD tools.
-
45.2 Subwavelength Lithography and its Potential Impact on Design and
EDA [p. 799]
- Andrew B. Kahng, Y. C. Pati
This tutorial paper surveys the potential implications of subwave-length
optical lithography for new tools and flows in the interface
between layout design and manufacturability. We review control
of optical process effects by optical proximity correction (OPC)
and phase-shifting masks (PSM), then focus on the implications
of OPC and PSM for layout synthesis and verification methodologies.
Our discussion addresses the necessary changes in the design-to-manufacturing
flow, including infrastructure development in the
mask and process communities, evolution of design methodology,
and opportunities for research and development in the physical lay-out
and verification areas of EDA.
Chair: Mahesh Mehendale
Organizers: Sunil Sherlekar, Luciano Lavagno
-
46.1 Synthesis of Embedded Software Using Free-Choice Petri Nets [p. 805]
- Marco Sgroi, Luciano Lavagno, Yosinori Watanabe, Alberto
Sangiovanni-Vincentelli
Software synthesis from a concurrent functional specification is a
key problem in the design of embedded systems. A concurrent
specification is well-suited for medium-grained partitioning. However,
in order to be implemented in software, concurrent tasks need
to be scheduled on a shared resource (the processor). The choice
of the scheduling policy mainly depends on the specification of the
system. For pure dataflow specifications, it is possible to apply a
fully static scheduling technique, while for algorithms containing
data-dependent control structures, like the if-then-else or while-do
constructs, the dynamic behaviour of the system cannot be completely
predicted at compile time and some scheduling decisions
are to be made at run-time. For such applications we propose a
Quasi-static scheduling (QSS) algorithm that generates a schedule
in which run-time decisions are made only for data-dependent
control structures. We use Free Choice Petri Nets (FCPNs), as underlying
model, and define quasi-static schedulability for FCPNs.
The proposed algorithmis complete, in that it can solve QSS for any
FCPN that is quasi-statically schedulable. Finally, we show how
to synthesize from a quasi-static schedule a C code implementation
that consists of a set of concurrent tasks.
-
46.2 Exact Memory Size Estimation for Array Computations without Loop Unrolling [p. 811]
- Ying Zhao, Sharad Malik
This paper presents a new algorithm for exact estimation
of the minimum memory size required by programs dealing
with array computations. Memory size is an important
factor affecting area and power cost of memory units. For
programs dealing mostly with array computations, memory
cost is a dominant factor in the overall system cost. Thus,
exact estimation of memory size required by a program is
necessary to provide quantitative information for making
high-level design decisions.
Based on formulated live variables analysis, our algorithm
transforms the minimum memory size estimation into
an equivalent problem: integer point counting for intersection/union
of mappings of parameterized polytopes. Then,
a heuristics was proposed to solve the counting problem. Experimental
results show that the algorithm achieves the exactness
traditionally associated with totally-unrolling loops
while exploiting the reduced computation complexity by preserving
original loop structure.
-
46.3 Constraint Driven Code Selection for Fixed-Point DSPs [p. 817]
- Steven Bashford, Rainer Leupers
Fixed-point DSPs are a class of embedded processors
with highly irregular architectures. This irregularity
makes it difficult to generate high-quality machine
code from programming languages such as C. In this paper
we present a novel constraint driven approach to code
selection for irregular processor architectures, which provides
a twofold improvement of earlier work. First, it
handles complete data flow graphs instead of trees and
thereby generates better code in presence of common
subexpressions. Second, the presented technique is not
restricted to computation of a single solution, but it generates
alternative solutions. This feature enables the tight
coupling of different code generation phases, resulting in
better exploitation of instruction-level parallelism. Experimental
results indicate that our technique is capable
of generating machine code that competes well with handwritten
assembly code.
-
46.4 Rapid Development of Optimized DSP Code From a High Level Description
Through Software Estimations [p. 823]
- Alain Pegatoquet, Emmanuel Gresset, Michel Auguin, Luc Bianco
Generation of optimized DSP code from a high level language
such as C is very time consuming since current DSP compilers are
generally unable to produce efficient code. We present a software
estimation methodology from a C description that helps for a
rapid development of DSP applications. Our tool VESTIM
provides both a performance evaluation for assembly code
generated by the compiler and an estimation of an optimized
assembly code. Blocks of applications G.721 and G.728 have been
evaluated using VESTIM. Results show that estimations are very
accurate and allow software development time to be significantly
reduced.
Keywords:
DSP, Code generation, Performance Estimation.
-
46.5 Software Environment for a Multiprocessor DSP [p. 827]
- Asawaree Kalavade, Joe Othmer, Bryan Ackland, K. J. Singh
In this paper, we describe the software environment
for Daytona, a single-chip, bus-based,
shared-memory, multiprocessor DSP. The software
environment is designed around a layered
architecture. Tools at the lower layer are
designed to deliver maximum performance and
include a compiler, debugger, simulator, and
profiler. Tools at the higher layer focus on
improving the programmability of the system
and include a run-time kernel and parallelizing
tools. The run-time kernel includes a low-over-head,
preemptive, dynamic scheduler with multiprocessor
support that guarantees real-time
performance to admitted tasks.
1.1 Keywords:
Multiprocessor DSP, media processor, software environment, run-time
kernel, RTOS
Chair: Ken Hodor
Organizers: Mike Murray, Neil Weste
-
47.1 Robust FPGA Intellectual Property Protection Through Multiple Small
Watermarks [p. 831]
- John Lach, William H. Mangione-Smith, Miodrag Potkonjak
A number of researchers have proposed using digital marks to
provide ownership identification for intellectual property. Many
of these techniques share three specific weaknesses: complexity of
copy detection, vulnerability to mark removal after revelation for
ownership verification, and mark integrity issues due to partial
mark removal. This paper presents a method for watermarking
field programmable gate array (FPGA) intellectual property (IP)
that achieves robustness by responding to these three weaknesses.
The key technique involves using secure hash functions to
generate and embed multiple small marks that are more
detectable, verifiable, and secure than existing IP protection
techniques.
Keywords:
Field programmable gate array (FPGA), intellectual property
protection, watermarking
-
47.2 Robust Techniques For Watermarking Sequential Circuit Designs [p. 837]
- Arlindo L. Oliveira
We present a methodology for the watermarking of synchronous sequential
circuits that makes it possible to identify the authorship of
designs by imposing a digital watermark on the state transition graph
of the circuit. The methodology is applicable to sequential designs
that are made available as firm Intellectual Property (IP), the designation
commonly used to characterize designs specified as structural
descriptions or circuit netlists.
The watermarking is obtained by manipulating the state transition
graph of the design in such a way as to make it exhibit a chosen
property that is extremely rare in non-watermarked circuits, while, at
the same time, not changing the functionality of the circuit. This manipulation
is performed without ever actually computing this graph
in either implicit or explicit form. We present both theoretical and
experimental results that show that the watermarking can be created
and verified efficiently.
-
47.3 Effective Iterative Techniques for Fingerprinting Design IP [p. 843]
- Andrew E. Caldwell, Hyun-Jin Choi, Andrew B. Kahng, Stefanus Mantik,
Miodrag Potkonjak, Gang Qu, Jennifer L. Wong
While previous watermarking-based approaches to intellectual
property protection (IPP) have asymmetrically emphasized the IP
provider's rights, the true goal of IPP is to ensure the rights of both
the IP provider and the IP buyer. Symmetric fingerprinting schemes
have been widely and effectively used to achieve this goal; however,
their application domain has been restricted only to static artifacts,
such as image and audio. In this paper, we propose the
first generic symmetric fingerprinting technique which can be ap-plied
to an arbitrary optimization/synthesis problem and, therefore,
to hardware and software intellectual property. The key idea is to
apply iterative optimization in an incremental fashion to solve a fingerprinted
instance; this leverages the optimization effort already
spent in obtaining a previous solution, yet generates a uniquely fingerprinted
new solution. We use this approach as the basis for developing
specific fingerprinting techniques for four important problems
in VLSI CAD: partitioning, graph coloring, satisfiability, and
standard-cell placement. We demonstrate the effectiveness of our
fingerprinting techniques on a number of standard benchmarks for
these tasks. Our approach provides an effective tradeoff between
runtime and resilience against collusion.
-
47.4 Behavioral Synthesis Techniques for Intellectual Property Protection
[p. 849]
- Inki Hong, Miodrag Potkonjak
The economic viability of the reusable core-based design paradigm
depends on the development of techniques for intellectual property
protection. We introduce the first dynamic watermarking technique
for protecting the value of intellectual property of CAD and compilation
tools and reusable core components. The essence of the
new approach is the addition of a set of design and timing constraints
which encodes the author's signature. The constraints are
selected in such a way that they result in minimal hardware overhead
while embedding the signature which is unique and difficult to
detect, remove and forge. We establish the first set of relevant metrics
which forms the basis for the quantitative analysis, evaluation,
and comparison of watermarking techniques. We develop a generic
approach for signature data hiding in designs, which is applicable
in conjunction with an arbitrary behavioral synthesis task, such as
scheduling, assignment, allocation, and transformations. Error correcting
codes are used to augment the protection of the signature
data from tampering attempts. On a large set of design examples,
studies indicate the effectiveness of the new approach in a sense
that the signature data, which are highly resilient, difficult to detect
and remove, and yet easy to verify, can be embedded in designs
with very low hardware overhead.
Chair: Vivek Tiwari
Organizers: Tadahiro Kuroda, Mojy Chian
-
48.1 Design and Implementation of a Scalable Encryption Processor with Embedded
Variable DC/DC Converter [p. 855]
- James Goodman, Anantha Chandrakasan, Abram P. Dancy
This work describes the design and implementation of
an energy-efficient, scalable encryption processor that
utilizes variable voltage supply techniques and a high-efficiency
embedded variable output DC/DC converter.
The resulting implementation dissipates 134nJ/bit @
V DD = 2.5V, when encrypting at its maximum rate of
1Mb/s using a maximum datapath width of 512 bits. The
embedded converter achieves an efficiency of 96% at
this peak load. The processor is 2-3 orders of magnitude
more energy efficient than optimized assembly code running
on a low-power processor such as the StrongARM.
-
48.2 Design Considerations for Battery-Powered Electronics [p. 861]
- Massoud Pedram, Qing Wu
In this paper, we consider the problem of maximizing
the battery life (or duration of service) in battery-powered CMOS
circuits. We first show that the battery efficiency (or utilization
factor) decreases as the average discharge current from the
battery increases. The implication is that the battery life is a
super-linear function of the average discharge current. Next we
show that even when the average discharge current remains the
same, different discharge current profiles (distributions) may
result in very different battery lifetimes. In particular, the
maximum battery life is achieved when the variance of the
discharge current distribution is minimized. Analytical derivations
and experimental results underline importance of the correct
modeling of the battery-hardware system as a whole and provide a
more accurate basis (i.e., the battery discharge times delay
product) for comparing various low power optimization
methodologies and techniques targeted toward battery-powered
electronics. Finally, we calculate the optimal value of Vdd for a
battery-powered VLSI circuit so as to minimize the product of the
battery discharge times the circuit delay.
-
48.3 Cycle-Accurate Simulation of Energy Consumption in Embedded Systems
[p. 867]
- Tajana Simunic, Luca Benini, Giovanni De Micheli
This paper presents a methodology for cycle-accurate simulation
of energy dissipation in embedded systems. The
ARM Ltd. [1] instruction-level cycle-accurate simulator is
extended with energy models for the processor, the L2 cache,
the memory, the interconnect and the DC-DC converter. A
SmartBadge, which can be seen as an embedded system consisting
of StrongARM-1100 processor, memory and the DC-DC converter,
is used to evaluate the methodology with the
Dhrystone benchmark. We compared performance and energy
computed by our simulator with measurements in hardware
and found them in agreement within a 5% tolerance.
The simulation methodology was applied to design exploration
for enhancing a SmartBadge with real-time MPEG feature.
-
48.4 Lowering Power Consumption in Clock by Using Globally Asynchronous
Locally Synchronous Design Style [p. 873]
- A. Hemani, T. Meincke, S. Kumar, A. Postula, T. Olsson, P. Nilsson,
J. Oberg, P. Ellervee, D. Lundqvist
Power consumption in clock of large high performance
VLSIs can be reduced by adopting Globally Asynchronous,
Locally Synchronous design style (GALS). GALS
has small overheads for the global asynchronous communication
and local clock generation. We propose
methods to a) evaluate the benefits of GALS and account
for its overheads, which can be used as the basis
for partitioning the system into optimal number/size of
synchronous blocks, and b) automate the synthesis of
the global asynchronous communication. Three realistic
ASICs, ranging in complexity from 1 to 3 million
gates, were used to evaluate GALS benefits and overheads.
The results show an average power saving of
about 70% in clock with negligible overheads.
Chair: Randolph E. Harr
Organizers: Anantha Chandrakasan, Mojy Chian
-
49.1 A CAD Tool for Optical MEMS [p. 879]
- Timothy P. Kurzweg, Steven P. Levitan, Philippe J. Marchand,
Jose A. Martinez, Kurt R. Prough, Donald M. Chiarulli
Chatoyant models free-space opto-electronic components
and systems and performs simulations and
analyses that allow designers to make informed system
level trade-offs. Recently, the use of MEM
bulk and surface micro-machining technology has
enabled the fabrication of micro-optical-mechanical
systems. This paper presents our models for
diffractive optics and new analysis techniques
which extend Chatoyant to support optical MEMS
design. We show these features in the simulation
of two optical MEM systems.
Keywords:
Optical MEMS, MEMS-CAD, MOEMS, micro-optics
-
49.2 On Thermal Effects in Deep Sub-Micron VLSI Interconnects [p. 885]
- Kaustav Banerjee, Amit Mehrotra, Alberto Sangiovanni-Vincentelli,
Chenming Hu
This paper presents a comprehensive analysis of the thermal effects in
advanced high performance interconnect systems arising due to self-heating
under various circuit conditions, including electrostatic
discharge. Technology (Cu, low-k etc) and scaling effects on the
thermal characteristics of the interconnects, and on their
electromigration reliability has been analyzed simultaneously, which
will have important implications for providing robust and aggressive
deep submicron interconnect design guidelines. Furthermore, the
impact of these thermal effects on the design (driver sizing) and
optimization of the interconnect length between repeaters at the upper-level
signal lines are investigated.
-
49.3 Converting a 64b PowerPC Processor from CMOS Bulk to SOI Technology
[p. 892]
- D. Allen, D. Behrends, B. Stanisic
A 550MHz 64b PowerPC processor was developed
for fabrication in Silicon-On-Insulator (SOI)
technology from a processor previously designed and
fabricated in bulk CMOS [1]. Both the design and the
associated CAD methodology (point tools, flow, and
models) were modified to handle demands specific to
SOI technology. The challenge was to improve the
cycle time by adapting the circuit design, timing, and
chip integration methodologies to accommodate
effects unique to SOI.
-
49.4 A Framework for Collaborative and Distributed Web-Based Design [p. 898]
- Gangadhar Konduri, Anantha Chandrakasan
The increasing complexity and geographical separation
of design data, tools and teams has created
a need for a collaborative and distributed
design environment. In this paper we present
a framework that enables collaborative and distributed
Web-based CAD, in which the designers
can collaborate on a design and efficiently utilize
existing design tools on the Internet. The framework
includes a Java-based hierarchical collaborative
schematic/block editor with interfaces to
distributed Web tools and cell libraries, infrastructure
to store and manipulate design objects,
and protocols for tool communication, message
passing and collaboration.
Chair: David Blaauw
Organizers: Anantha Chandrakasan, David Blaauw
-
50.1 Dealing With Inductance In High-Speed Chip Design [p. 904]
- Phillip Restle, Albert Ruehli, Steven G. Walker
Inductance effects in on-chip interconnects have become significant for
specific cases such as clock distributions and other highly optimized
networks [1,2]. Designers and CAD tool developers are searching for
ways to deal with these effects. Unfortunately, accurate on-chip inductance
extraction and simulation in the general case are much more difficult
than capacitance extraction. In addition, even if ideal extraction tools
existed, most chip designers have little experience designing with lossy
transmission lines. This tutorial will attempt to demystify on-chip
inductance through the discussion of several illustrative examples
analyzed using full-wave extraction and simulation methods. A specialized
PEEC (Partial Element Equivalent Circuit) method tailored for chip applications
was used for most cases. Effects such as overshoot, reflections, frequency
dependent effective resistance and inductance will be illustrated using
animated visualizations of the full-wave simulations. Simple examples
of design techniques to avoid, mitigate, and even take advantage of
on-chip inductance effects will be described.
-
50.2 Interconnect Analysis: From 3-D Structures to Circuit Models [p. 910]
- M. Kamon, N. Marques, Y. Massoud, L. Silveira, J. White
In this survey paper we describe the combination of:
discretized integral formulations, sparsification
techniques, and krylov-subspace based model-order
reduction that has led to robust
tools for automatic generation of macromodels
that represent the distributed RLC effects in 3-D
interconnect. A few computational results are
presented, mostly to point out the problems yet
to be addressed.
-
50.3 IC Analyses Including Extracted Inductance Models [p. 915]
- Michael W. Beattie, Lawrence T. Pileggi
IC inductance extraction generally produces either port
inductances based on simplified current path assumptions
or a complete partial inductance matrix. Combining
either of these results with the IC interconnect
resistance and capacitance models significantly complicates
most IC design and verification methodologies. In
this tutorial paper we will review some of the analysis
and verification problems associated with on-chip inductance,
and present a subset of recent results for partially
addressing the challenges which lie ahead.
Keywords:
Interconnect; Inductance; Model Order Reduction.
-
50.4 On-chip Inductance Issues in Multiconductor Systems [p. 921]
- Shannon V. Morton
As the family of Alpha microprocessors continues to scale into
more advanced technologies with very high frequency edge rates
and multiple layers of interconnect, the issue of characterizing
inductive effects and providing a chip-wide design methodology
becomes an increasingly complex problem. To address this issue,
a test chip has been fabricated to evaluate various conductor
configurations and verify the correctness of the simulation
approach. The implementation of and results from this test chip
are presented in this paper. Furthermore the analysis has been
extended to the upcoming EV7 microprocessor, and important
aspects of the derivation of its design methodology, as pertains to
these inductive effects, are discussed.
Keywords:
Alpha microprocessor, semiconductor, interconnect, buses,
inductance, resistance, capacitance, RLC, noise, cross-talk,
transmission line.
Chair: Sharad Malik
Organizers: Rajesh Gupta, David Ku
-
51.1 A Methodology for Accurate Performance Evaluation in Architecture
Exploration [p. 927]
- George Hadjiyiannis, Pietro Russo, Srinivas Devadas
We present a system that automatically generates a cycle-accurate
and bit-true Instruction Level Simulator (ILS) and a hardware implementation
model given a description of a target processor. An
ILS can be used to obtain a cycle count for a given program running
on the target architecture, while the cycle length, die size, and
power consumption can be obtained from the hardware implementation
model. These figures allow us to accurately and rapidly evaluate
target architectures within an architecture exploration methodology
for system-level synthesis.
In an architecture exploration scheme, both the ILS and the hardware
model must be generated automatically, else a substantial programming
and hardware design effort has to be expended in each
design iteration. Our system uses the ISDL machine description language
to support the automatic generation of the ILS and the hardware
synthesis model, as well as other related tools.
-
51.2 LISA - Machine Description Language for Cycle-Accurate Models of
Programmable DSP Architectures [p. 933]
- Stefan Pees, Andreas Hoffmann, Vojin Zivojnovic, Heinrich Meyr
This paper presents the machine description language LISA for the
generation of bit-and cycle accurate models of DSP processors. Based
on a behavioral operation description, the architectural details
and pipeline operations of modern DSP processors can be covered.
Beyond the behavioral model, LISA descriptions include other
architecture-related information like the instruction set. The
information provided by LISA models enables automatic generation of
simulators and assemblers which are essential elements of DSP software
development environments. in order to proof the applicability of our
approach, a realized model of the Texas Instruments TMS320C6201 DSP
is presented and derived LISA code examples are given.
-
51.3 Exploiting Intellectual Properties in ASIP Designs for Embedded DSP
Software [p. 939]
- Hoon Choi, Ju Hwan Yi, Jong-Yeol Lee, In-Cheol Park, Chong-Min Kyung
The growing requirements on the correct design of a high-performance
system in a short time force us to use IP's in many
designs. In this paper, we propose a new approach to select the
optimal set of IP's and interfaces to make the application program
meet the performance constraints in ASIP designs. The
proposed approach selects IP's with considering interfaces and
supports concurrent execution of parts of task in kernel as software
code with others in IP's, while the previous state-of-the-art
approaches do not consider IP's and interfaces simultaneously
and cannot support the concurrent execution. The experimental
results on real applications show that the proposed approach is
effective in making application programs meet the performance
constraints using IP's.
Chair: Hidetoshi Onodera
Organizers: Alan Mantooth, Hidetoshi Onodera
-
52.1 MAELSTROM: Efficient Simulation-Based Synthesis for Custom Analog Cells
[p. 945]
- Michael Krasnicki, Rodney Phelps, Rob A. Rutenbar, L. Richard Carley
Analog synthesis tools have failed to migrate into mainstream use
primarily because of difficulties in reconciling the simplified models
required for synthesis with the industrial-strength simulation
environments required for validation. MAELSTROM is a new approach
that synthesizes a circuit using the same simulation environment
created to validate the circuit. We introduce a novel genetic/
annealing optimizer, and leverage network parallelism to achieve
efficient simulator-in-the-loop analog synthesis.
-
52.2 Behavioral Synthesis of Analog Systems using Two-Layered Design Space
Exploration [p. 951]
- Alex Doboli, Adrian Nunez-Aldana, Nagu Dhanwada, Sree Ganesan,
Ranga Vemuri
This paper presents a novel approach for synthesis of analog systems
from behavioral VHDL-AMS specifications. We implemented
this approach in the VASE behavioral-synthesis tool. The synthesis
process produces a netlist of electronic components that are
selected from a component library and sized such that the overall
area is minimized and the rest of the performance constraints such
as power, slew-rate, bandwidth, etc. are met. The gap between
system level specifications and implementations is bridged using a
hierarchically-organized, design-space exploration methodology.
Our methodology performs a two-layered synthesis, the first being
architecture generation, and the other component synthesis and
constraint transformation. For architecture generation we suggest
a branch-and-bound algorithm, while component synthesis and
constraint transformation use a Genetic Algorithm based heuristic
method. Crucial to the success of our exploration methodology
is a fast and accurate performance estimation engine that embeds
technology process parameters, SPICE models for basic circuits
and performance composition equations. We present a telecommunication
application as an example to illustrate our synthesis
methodology, and show that constraint-satisfying designs can be
synthesized in a short time and with a reduced designer effort.
-
52.3 Circuit Complexity Reduction for Symbolic Analysis of Analog Integrated
Circuits [p. 958]
- Walter Daems, Georges Gielen, Willy Sansen
This paper presents a method to reduce the complexity of a linear
or linearized (small-signal) analog circuit. The reduction technique,
based on quality-error ranking, can be used as a standard reduction
engine that ensures the validity of the resulting network model in a
specific (set of) design point(s) within a given frequency range and
a given magnitude and phase error. It can also be used as an analysis
engine to extract symbolic expressions for poles and zeroes. The
reduction technique is driven by analysis of the signal flow graph
associated with the network model. Experimental results show the
effectiveness of the approach.
Chair: Miodrag Potkonjak
Organizers: Mike Murray, Miodrag Potkonjak
-
53.1 Cycle and Phase Accurate DSP Modeling and Integration for HW/SW
Co-Verification [p. 964]
- Lisa Guerra, Joachim Fitzner, Dipankar Talukdar, Chris Schläger,
Bassam Tabbara, Vojin Zivojnovic
We present our practical experience in the modeling and
integration of cycle/phase-accurate instruction set architecture
(ISA) models of digital signal processors (DSPs) with other
hardware and software components. A common approach to the
modeling of processors for HW/SW co-verification relies on
instruction-accurate ISA models combined (i.e. wrapped) with the
bus interface models (BIM) that generate the clock/phase-accurate
timing at the component's interface pins. However, for DSPs and
new microprocessors with complex architectural features this
approach is from our perspective not acceptable. The additional
extensive modeling of the pipeline and other architectural details
in the BIM would force us to develop two detailed processor
models with a complex BIM API between them. We therefore
propose an alternative approach in which the processor ISAs
themselves are modeled in a full cycle/phase-accurate fashion.
The bus interface model is then reduced to just modeling the
connection to the pins. Our models have been integrated into a
number of cycle-based and event-driven system simulation
environments. We present one such experience in incorporating
these models into a VHDL environment. The accuracy has been
verified cycle-by-cycle against the gate/RTL level models. Multi-processor
debugging and observability into the precise cycle-accurate
processor state is provided. The use of co-verification
models in place of the RTL resulted in system speedups up to 10
times, with the cycle-accurate ISA models themselves reaching
performances of up to 123K cycles/sec.
-
53.2 A Study in Coverage-Driven Test Generation [p. 970]
- Mike Benjamin, Daniel Geist, Alan Hartman, Yaron Wolfsthal, Gerard Mas,
Ralph Smeets
One possible solution to the verification crisis is
to bridge the gap between formal verification
and simulation by using hybrid techniques.
This paper presents a study of such a functional
verification methodology that uses coverage of
formal models to specify tests. This was applied
to a modern superscalar microprocessor and
the resulting tests were compared to tests generated
using existing methods. The results
showed some 50% improvement in transition
coverage with less than a third the number of
test instructions, demonstrating that hybrid
techniques can significantly improve functional
verification.
1.1 Keywords:
Functional verification, test generation, formal models,
transition coverage
-
53.3 IC Test Using the Energy Consumption Ratio [p. 976]
- Wanli Jiang, Bapiraju Vinnakota
Dynamic-current based test techniques can potentially address the
drawbacks of traditional and Iddq test methodologies. The quality
of dynamic current based test is degraded by process variations
in IC manufacture. The energy consumption ratio (ECR) is a new
metric that improves the effectiveness of dynamic current test by
reducing the impact of process variations by an order of magnitude.
We address several issues of significant practical importance
to an ECR-based test methodology. We use the ECR to test a
low-voltage submicron IC with a microprocessor core. The ECR
more than doubles the effectiveness of the dynamic current test already
used to test the IC. The fault coverage of the ECR is greater
than that offered by any other test, including Iddq.We develop a
logic-level fault simulation tool for the ECR and techniques to set
the threshold for an ECR-based test process. Our results demonstrate
that the ECR offers the potential to be a high-quality low-cost
test methodology. To the best of our knowledge, this is the first
dynamic-current based test technique to be validated with manufactured
ICs.
Chair: Teresa Meng
Organizers: Teresa Meng, David Blaauw
-
54.1 Design Strategy of On-Chip Inductors for Highly Integrated RF Systems
[p. 982]
- C. Patrick Yue, S. Simon Wong
This paper describes a physical model for spiral
inductors on silicon which is suitable for circuit
simulation and layout optimization. Key issues
related to inductor modeling such as skin effect
and silicon substrate loss are discussed. An
effective ground shield is devised to reduce substrate
loss and noise coupling. A practical
design methodology based on the trade-off
between the series resistance and oxide capacitance
of an inductor is presented. This method
is applied to optimize inductors in state-of-the-art
processes with multilevel interconnects. The
impact of interconnect scaling, copper metallization
and low-K dielectric on the achievable
inductor quality factor is studied.
1.1 Keywords:
Spiral inductor, quality factor, skin effect, substrate loss,
substrate coupling, patterned ground shield, interconnects
-
54.2 The Simulation and Design of Integrated Inductors [p. 988]
- N. R. Belk, M. R. Frei, M. Tsai, A. J. Becker, K. L. Tokuda
At present there are two common types of integrated circuit
inductor simulation tools. The first type is based on the
Greenhouse methods[1], and obtains a solution in a fraction of a
second; however, because it does not use solutions of the inductor
charge and current distributions, it has limited accuracy. The
second type, method of moments (MoM) solvers, determines the
charge and current variations by decomposing the inductor into
thousands of sub elements and solving a matrix. However, this
process takes between minutes and hours to obtain a reasonably
accurate solution. In this paper, we present a series of algorithms
for solving inductors, of radius small compared to the wave
length of the electrical signal, that equal or exceed the accuracy
of MoM solvers, but obtain those solutions in roughly 1 second.
-
54.3 Optimization of Inductor Circuits via Geometric Programming [p. 994]
- Maria del Mar Hershenson, Sunderarajan S. Mohan, Stephen P. Boyd,
Thomas H. Lee
We present an efficient method for optimal design and synthesis
of CMOS inductors for use in RF circuits. This method uses
the the physical dimensions of the inductor as the design parameters
and handles a variety of specifications including fixed
value of inductance, minimum self-resonant frequency, minimum quality
factor, etc. Geometric constraints that can be handled
include maximum and minimum values for every design
parameter and a limit on total area.
Our method is based on formulating the design problem
as a special type of optimization problem called geometric programming,
for which powerful efficient interior-point methods
have recently been developed. This allows us to solve the inductor
synthesis problem globally and extremely efficiently.Also, we
can rapidly compute globally optimal trade-off curves between
competing objectives such as quality factor and total inductor
area.
We have fabricated a number of inductors designed by the
method, and found good agreement between the experimental
data and the specifications predicted by our method.
Chair: Richard Goering
Organizer: Stanley J. Krolikoski
Panel Members: Pierre Bricaud, James G. Dougherty, Steve Glaser,
Michael Keating, Robert Payne, Davoud Samani [p. 999]
-
Over the past year two distinct answers have emerged regarding SoC design
methodologies. On the one hand, it is posited in the Reuse
Methodology Manual, that a logic synthesis-based design methodology can be
used effectively to develop system chips. An alternative
methodology focuses on integration (or "reference") platforms and the
customization of the basic application-specific platform through the
addition of selected SW and/or HW IP blocks. This panel session will debate
the merits of these seemingly incompatible proposed SoC methodologies.
-
Automated Layout and Migration in Ultra-Deep Submicron VLSI
- Andrew B. Kahng, Shmuel Wimer, Cyrus S. Bamji, Chris Strolenberg
-
Analog and Mixed-Signal Modeling Using the VHDL-AMS Language
- Ernst Christen, Kenneth Bakalar, Allen M. Dewey, Eduard Moser
-
Information Visualization: Creating Advanced Visual Interfaces for
Analyzing Designs
- Kevin Mullet, Erric Solomon, David Overhauser
-
Design Reuse Tools and Methodologies
- Michael Keating, Warren Savage, Ulf Schlichtmann, Pierre Bricaud
-
Embedded Memories in System Design - From Technology to Systems
Architecture
- Francky Catthoor, Soren Hein, Vijay Nagasamy, Bernhard Rohfleisch,
Chris toforos E. Kozyrakis, Nikil D. Dutt
-
Built-In Self-Test for Systems on a Chip
- Janusz Rajski, Jerzy Tyszer, Nilanjan Mukherjee
|