|
ABSTRACTS ASP-DAC'95
Sessions
[1A]
[1B]
[1C]
[2A]
[2B]
[2C]
[3A]
[3B]
[3C]
[4A]
[4B]
[4C]
[5A]
[5B]
[5C]
[6A]
[6B]
[6C]
[7A]
[7B]
[7C]
[8A]
[8B]
[8C]
[9A]
[9B]
[9C]
Chair: Massoud Pedram (Univ. of South California, USA)
Co-Chair: Hidetoshi Onodera (Kyoto Univ., Japan)
A-1A.1 Transistor Reordering Rules for Power Reduction in CMOS Gates
Wen-Zen Shen, Jiing-Yuan Lin (National Chiao Tung Univ., Taiwan),
Fong-Wen Wang (Telecommunication Laboratories, MTC, Taiwan)
The goal of transistor reordering for a logic gate is to reduce the propagation
delay as well as the charging and discharging of internal capacitances to
achieve low power consumption. In this paper, based on the input signal
probabilities and transition densities, we propose a set of simple transistor
reordering rules for both basic and complex CMOS gates to minimize the
transition counts at the internal nodes. The most attractive feature of this
approach is that not only the power consumption is reduced efficiently, but also
the other performances are not degraded. Experimental results show that this
technique typically reduces the power by about 10% in average, but in some cases
the improvement is even 35%.
A-1A.2 Power Reduction by Gate Sizing with Path-Oriented Slack Calculation
How-Rern Lin and TingTing Hwang(Tsing Hua Univ., Taiwan)
This paper describes methods for reducing power consumption. We propose using
gate sizing technique to reduce power for circuits that have already satisfied
the timing constraint. Replacement of gates on noncritical paths with smaller
templates is used in reducing the dissipated power of a circuit. We find that
not only gates on noncritical paths can be down-sized, but also gates on
critical paths can be down-sized.A power reduction algorithm by means of single
gate resizing as well as multiple gates resizing will be proposed. In addition,
to identify gates to be resized, a path-oriented method in calculating slack
time with false path taken into consideration will be also proposed. During the
slack time computation, in order to prevent long false path from becoming
sensitizable and thus increasing the circuit delay, slack constraint will be
set for gates. Results on a set of circuits from MCNC benchmark set demonstrate
that our power reduction algorithm can reduce about 10% more power, on the
average, than a previously proposed gate sizing algorithm.
Keywords : low power, gate sizing, time slack, sensitizable path.
A-1A.3 Current and Charge Estimation in CMOS Circuits
Sanjay Dhar, Dave J. Gurney (Mentor Graphics Corp., USA)
CMOS circuits have significant amounts of dynamic short-circuit(or through)
current. This can be as large as 20% of the total in well-designed circuits,
and up to 80% of the total in circuits that have not been designed carefully.
This current depends strongly on the relative sizes of the pull-up to
pull-down paths. We introduce the dynamic short-circuit ratio to model this
parameter. This allows accurate estimate of currents including the
dynamic short-circuit current, and also results in improved delay estimation.
Accuracy is typically within 10% of circuit-level simulation while operating at
the switch-level abstraction.
Chair: Winfried Hahn (Univ. of Passau, Germany)
Co-Chair: Yoshio Takamine (Hitachi Ltd., Japan)
A-1B.1 Auriga2: A 4.7 Million-Transistor CISC Microprocessor
J. P. Tual, M. Thill, C. Bernard, H. N. Nguyen, F. Mottini, M. Moreau,
P. Vallet (BULL S. A., France)
With the introduction of the high range version of the DPS7000 mainframe family,
Bull is providing a processor which integrates the DPS7000 CPU and first level
of cache on one VLSI chip containing 4.7M transistors and using a 0.5 µm,
3Mlayers CMOS technology. This enhanced CPU has been designed to provide a
high integration, high performance and low cost systems. Up to 24 such processors
can be integrated in a single system, enabling performance levels in the range
of 850 TPC-A (Oracle) with about 12 000 simultaneously active connections. The
design methodology involved massive use of formal verification and symbolic
layout techniques, enabling to reach first pass right silicon on several
foundries. An architectural overview of the CPU with emphasis on several
original aspects of the design aspects (synthesis, verification, symbolic
layout) will be discussed in this paper.
A-1B.2 Automatic Design for Bit-Serial MSPA Architecture
Hiroaki Kunieda, Yusong Liao, Dongju Li, Kazuhito Ito, (Tokyo Inst. of Technology, Japan, Saitama University, Saitama)
Memory Sharing Processor Array (MSPA) architecture is effective in
both data storage and processor cell utilization efficiency. In this paper,
the design methodology for MSPA is extended to synthesize bit-serial datapath.
As a synthesis example, we propose a new bit-serial multiplier with smaller
number of logic gates than conventional bit-serial multipliers.
A-1B.3 Stoht --- An SDL-to-Hardware Translator
Ivanil S. Bonatti, Renato J. O. Figueiredo (Univ. of Campinas, Brazil)
This article presents a language translator that allows the use of SDL as a
front-end, high-level graphical description tool for hardware design. A subset
of this language is proposed for hardware design, including a synthesisable
model for SDL's signal-based communication. An algorithm to translate this
subset to fully synthesisable VHDL is proposed and implemented in a
public-domain software package.
A-1B.4 Enhancing a VHDL Based Design Methodology with Application
Specific Data Abstraction
Lars Lindqvist (NKT Electronik A/S and Danmarks Tekniske Univ., Denmark)
VHDL has successfully been introduced into the design methodology for VLSI
ASICs. This paper describes a high-level data abstraction and supporting tool
that enhance this methodology in the telecommunication application domain. A
significant performance gain was obtained by introducing the data abstraction
outside the VHDL simulator. The enhanced methodology has been used in current
ASIC designs with good results.
Session [A-1C]
PANEL: Design Automation 2000---Challenges for Gigabit-Era
Moderator: Richard K. Wallace, Editor-in-Chief, EE Times
Panelists: Joseph Costello (CEO, Cadence),
Jeffrey Edson (VP, Intergraph),
Aart deGeus (CEO, Synopsys),
Alan J. Hanover (CEO, Viewlogic),
Jinya Katsube (VP, Zuken),
Walden C. Rhines (CEO, Mentor)
Chair: Daniel D. Gajski (Univ. of California at Irvine, USA)
Co-Chair: Tadatoshi Ishii (Toshiba Corp., Japan)
A-2A.1 A Scheduling Algorithm for Synthesis of Bus-Partitioned Architectures
Vasily G. Moshnyaga, Fumiaki Ohbayashi, Keikichi Tamaru (Kyoto Univ., Japan)
Due to efficient interconnect structure and internal parallelism bus-partitioned
architectures are very beneficial for sub-micron chip design. This paper
presents a new approach for integrated scheduling and interconnect binding of
bus-segmented data-paths. Experiments show that the approach provides better
results than existing methods and is quite flexible.
A-2A.2 Reclocking for High-Level Synthesis
Pradip Jha, Nikil Dutt (Univ. of California, Irvine, USA), Sri Parameswaran
(Univ. of Queensland, Australia)
In this paper we describe, a powerful post-synthesis approach called reclocking,
for performance improvement by minimizing the total execution time. By back
annotating the wire delays of designs created by a high level synthesis system,
and then finding an optimal clockwidth, we resynthesize the controller to
improve performance without altering the datapath. Reclocking is versatile and
can be applied not only for wire delay consideration, but also for bit-width
migration, library migration and for feature size migration supporting the
philosophy of design reuse. Experimental results show that with re-clocking,
the performance of the input designs can be improved by as much as 34%.
A-2A.3 Synthesis of False Loop Free Circuits
Shih Hsu Huang, Ta-Yung Liu, Yu-Chin Hsu (Univ. of California, Riverside, USA),
Yen-Jen Oyang (National Taiwan Univ., Taiwan)
In behavior synthesis, an improper resource sharing may result in a circuit
containing false loops which is non-simulatable or non-timing-analyzable.
Previous approaches solve this problem during the datapath allocation phase.
To build a false loop free circuit, they may have to allocate additional
functional units other than those defined in the resource constraints. In this
paper, we present an approach to solve the problem during the scheduling phase.
Our scheduling algorithm finds a schedule which guarantees to have a false loop
free circuit mapping under the given resource constraints. Experiments show the
proposed approach finds false loop free schedule for most of the examples
without introducing extra control steps.
A-2A.4 High-Level Synthesis Scheduling and Allocation Using Genetic Algorithms
M. J. M. Heijligers, L. J. M. Cluitmans, J. A. G. Jess
(Eindhoven Univ. of Technology, The Netherlands)
In this article a scheduling method is presented which is capable of allocating
supplementary resources during scheduling. This makes it very suitable in
synthesis strategies based on lower bound estimations techniques. The method is
based on genetic algorithms. Special coding techniques and analysis methods are
used to improve the runtime and quality of the results. The scheduler can
easily be extended to cover other architectural issues and (for example) provides
ways to make trade-offs between functional unit allocation and register
allocation. Experiments and comparisons show high quality results and fast run
times that outperform results produced by other heuristic scheduling methods.
Chair: Graham R. Hellestrand (Univ. of New South Wales, Australia)
Co-Chair: Masatoshi Sekine (Toshiba Corp., Japan)
A-2B.1 A Framework for the Analysis and Design of Algorithms for a
Class of VLSI-CAD Optimization Problems
C. -J. Shi (Univ. of Iowa, USA), J. A. Brzozowski (Univ. of Waterloo, Canada)
A simple mathematical framework, called cluster-cover, is established for
several VLSI optimization problems including logic minimization, constrained
encoding, multi-layer topological planar routing, application timing assignment
for delay-fault testing, and minimization of monitoring logic for BIST
enhancement. Two paradigms, prime covering and greedy peeling, are presented
for developing both exact and heuristic algorithms. The paradigms capture
generally applicable ingredients from previously developed algorithms for
individual applications. This makes it possible to re-use established techniques
in new problems, and provide new insights into existing problems. The paradigms
are simple enough to be amenable to theoretical analysis. Bounds on the
performance of greedy peeling are derived; these bounds are applicable to many
published heuristics which previously could be evaluated only by benchmarks.
A-2B.2 Generic Fuzzy Logic CAD Development Tool
Eric Q. Kang, Eugene Shragowitz (Univ. of Minnesota, USA)
This paper describes a generic fuzzy logic CAD development tool and
reports on application of it to some important CAD problems. This menu-based
tool allows to introduce linguistic variables in a textual and graphic form by
clicking a menu. It permits users to define membership functions in an
analytical, table or graphical forms. It connects linguistic variables by
fuzzy logic operators to create fuzzy logic and generates their graphical
representations.
A-2B.3 A Hardware/Software Codesign Method for Pipelined Instruction Set
Processor Using Adaptive Database
Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi,
(Toyohashi Univ. of Technology, Japan),
Nobuyuki Hikichi, (Software Research Associates, Inc., Japan)
This paper proposes a new method to design an optimal pipelined instruction set
processor using a formal HW/SW codesign methodology. First, a HW/SW
partitioning algorithm for selecting an optimal pipelined architecture is
introduced briefly. Then, an adaptive database approach is presented that
enables to enhance the optimality of the design through very accurate
estimation of the performance of a pipelined ASIP in HW/SW partitioning. The
experimental results show that the proposed methods are effective and efficient.
A-2B.4 EMPAR: An Interactive Synthesis Environment for Hardware Emulations
Tsing-Gen Lee, Wen-Jong Fang, Allen C.-H. Wu (Tsing Hua Univ., Taiwan)
In this paper, we present EMPAR, an interactive synthesis environment for
hardware emulations. EMPAR provides an open-ended design environment for the
development of hardware emulators, which is capable of supporting: (1) a variety
of EM architectures (2) a variety of EM synthesis algorithms, (3) interactive
control by the user, and (4) design quality analysis. An X-window based
graphical user interface has been developed to support a variety of interactive
design tasks. The key features in EMPAR are : (1) it is open-ended so that
arbitrary algorithms and tools can be built on top of it and (2) it is fully
interactive and leaves the control to the designer. EMPAR can be used as a
design environment for existing hardware emulators as well as a test bed for the
evaluation and development of new hardware-emulator architectures and synthesis
algorithms.
Chair: Chong-Min Kyung, KAIST, Korea
A-2C.1 A Scheduling Algorithm for Multiport Memory Minimization in Datapath
Synthesis
Hae-Dong Lee, Sun-Young Hwang (Sogang Univ., Korea)
In this paper, we present a new scheduling algorithms that generates
area-efficient register transfer level datapaths with multiport memories. The
proposed scheduling algorithm assigns an operation to a specific control step
such that maximal sharing of functional units can be achieved with minimal
number of memory ports, while satisfying given constraints. We propose a
measure of multiport memory cost, MAV (Multiple Access Variable) which is
defined as a variable accessed at several control steps, and overall memory
cost is reduced by equally distributing the MAVs throughout all the control
steps. When compared with previous approaches for several benchmarks available
from the literature, the proposed algorithm generates the datapaths with less
memory modules and interconnection structures by reflecting the memory cost in
the scheduling process.
A-2C.2 An Integrated Hardware-Software Cosimulation Environment
for Heterogeneous Systems Prototyping
Yongjoo Kim, Kyuseok Kim, Youngsoo Shin, Taekyoon Ahn,
Wonyong Sung, Kiyoung Choi, Soonhoi Ha (Seoul National Univ., Korea)
In this paper, we present a hardware-software cosimulation environment for
heterogeneous systems. To be an efficient system verification environment
for the rapid prototyping of heterogeneous systems, the environment provides
interface transparency, simulation acceleration, smooth transition to
cosynthesis, and integrated user interface and internal representation. As an
experimental example, a heterogeneous system is cosimulated and prototyped
successfully, which shows that our environment can be a useful heterogeneous
system specification/verification environment for rapid prototyping.
A-2C.3 A CSIC Implementation with POCSAG Decoder and Microcontroller
for Paging Applications
J.Y. Lim , G. Kim (PANTECH Co., Ltd., Korea), I.-S. O (Univ. of Inchon, Korea),
J.H. Cho, Y.J. Kim, H.Y. Kim (PANTECH Co., Ltd., Korea)
This paper presents a CSIC (Customer Specification Integrated Circuit)
implementation, which includes a 512/1200/2400 bps POCSAG decoder, PDI2400 and
MC68HC-05 changed by PANTECH. It can receive all the data with the rate of
512/1200/2400 bps of a single clock of 76.8 KHz. It is designed to have
maximum 2 own frames for service enhancement. To improve receiver quality, a
preamble detection considering frequency tolerance and a SCW (Synchronization
Code Word) detection at every 4 bit is suggested. Also we consider an error
correction of address and message up to 2 bits. Furthermore, it is possible with
proposed PF (Preamble Frequency) error to achieve a battery life increase due to
the turn-off of RF circuits when the preamble signal is detected with noises.
The chip is designed using VHDL code from PDI2400 micro-architecture level. It
is verified with VHDL simulation software of PowerView TM . Its logic diagrams
are synthesized with VHDL synthesis software of PowerView TM . Proposed decoder
and MC68HC05 CPU of MOTOROLA are integrated with about 88000 transistors by
using 1.0um HCMOS process and named MC68HC05PD6. It is proved that the wrong
detection numbers of preamble of noises are significantly reduced in the pager
system that uses our chip through the real field test. The system receiving
performance is improved by 20% of average, compared with other existing systems.
A-2C.4 Performance-Driven Circuit Partitioning for Prototyping by
Using Multiple FPGA Chips
Chunghee Kim, Hyunchul Shin (Hanyang Univ., Korea),
Young-Uk Yu (Seodu Logic Inc., Korea)
A new performance-driven partitioning algorithm has been developed to implement
a large circuit by using multiple FPGA chips. Partitioning for multiple FPGAs
has several constraints to satisfy so that each partitioned subcircuit can be
implemented in a FPGA chip. To obtain satisfactory results under the constraints
, the partitioning is performed in two phases which are the initial
partitioning for global optimization and the iterative partitioning improvements
for constraint satisfaction. Experimental results using the MCNC benchmark
examples show that our partition method produces better results than those of
other recent approaches on the average and that performance-driven partitioning
is effective in reducing critical time delays.
Chair: Chung-Kuan Cheng (Univ. of California, San Diego, USA)
Co-Chair: Kazuhiro Ueda (Shibaura Inst. of Technology, Japan)
A-3A.1 A New System Partitioning Method under Performance and Physical
Constraints for Multi-Chip Modules
Yoshinori Katsura, Tetsushi Koide, Shin'ichi Wakabayashi (Hiroshima Univ., Japan),
Noriyoshi Yoshida (Hiroshima City Univ., Japan)
In this paper, we propose a new and the first MCM system partitioning method
considering chip-to-chip delays, chip areas and I/O pins constraints. The
proposed method consists of two steps, a clustering step considering the three
constraints and an iterative improvement step with mathematical programming.
In the first step, we apply two clustering algorithms considering the three
constraints and reduce the size of the large MCM system partitioning problem
so as to get a solution within a practical computation time. Next, it generates
an initial partitioning with 0-1 integer linear programming(ILP) so as to
minimize the total wire length. Since there may exist constraint violations
in the initial solution, in the second step, we formulate the partitioning
problem as a LP problem selecting a maximal independent set and improve it
until the total number of cuts is not decreased and the three constraints are
satisfied. We also showed that the number of the timing constraints can be
reduced by deleting redundant timing constraints. Experimental results showed
that the proposed method is able to produce partitions satisfying the three
constraints and improves the number of cuts by 27% on an average and a 30% in
maximum over the conventional method[5] considering only two constrains.
A-3A.2 A Robust Min-Cut Improvement Algorithm based on Dynamic Look-Ahead
Weighting
Katsunori Tani (NEC, Japan)
This paper proposes an approach to enhance Fiduccia-Mattheyses' min-cut
algorithm. The approach includes two new ideas: Look-ahead Weighting and
Dynamic Weighting . The former is based on the concept of VLSI placement method
using quadratic programming. The latter is a technique to carry the better
behavior of move-and-lock improvement strategy. Experiments on practical
circuits with 5K~140K cells show that the proposed approach achieves
promising results.
A-3A.3 Timing Influenced General-Cell Genetic Floorplanner
Sadiq M. Sait, Habib Youssef, Shahid K. Tanvir, and Muhammad S. T. Benten
(King Fahd Univ. of Petroleum and Minerals, Saudi Arabia)
In this paper we present a timing-influenced floorplanner for general cell IC
design. The floorplanner works in two phases. In the first phase we restrict the
modules to be rigid and the foorplan to be slicing. The second phase of
foorplanner allows modification to the aspect ratios of individual modules to
further reduce the area of the overall bounding box. The first phase is
implemented using genetic algorithm while in the second phase we adopt a
constraint graph based approach. Experimental results are also presented.
Chair: Akihiko Yamada (Tokyo Metropolitan Univ., Japan)
Co-Chair: Jun Sato (Tsuruoka National College of Technology, Japan)
A-3B.1 Power Analysis of a 32-bit Embedded Microcontroller
Vivek Tiwari (Princeton Univ., USA),
Mike Tien-Chien Lee (Fujitsu Labs. of America, USA)
A new approach for power analysis of microprocessors has recently been proposed
[1]. The idea is to look at the power consumption in a microprocessor from the
point of view of the actual software executing on the processor. The basic
component of this approach is a measurement based, instruction-level power
analysis technique. The technique allows for the development of an
instruction-level power model for the given processor, which can be used to
evaluate software in terms of the power consumption, and for exploring the
optimization of software for lower power. This paper describes the application
of this technique for a comprehensive instruction-level power analysis of a
commercial 32-bit RISC-based embedded microcontroller. The salient results of
the analysis and the basic instruction-level power model are described.
Interesting observations and insights based on the results are also presented.
Such an instruction-level power analysis can provide cues as to what
optimizations in the micro-architecture design of the processor would lead to
the most effective power savings in actual software applications. Wherever the
results indicate such optimizations, they have been discussed. Furthermore,
ideas for low power software design, as suggested by the results, are described
in this paper as well.
A-3B.2 Assessing the Feasibility of Interface Designs before their Implementation
Marco A. Escalante, Nikitas J. Dimopoulos (Univ. of Victoria, Canada)
During the design of microprocessor-based systems, once the system architecture
has been decided and the major components (processors, memories, IO devices)
have been selected from a component library, it is necessary to design
interface logic to integrate the system. Such an interface design can be
carried out based on the protocols used by the components. This paper addresses
the problem of determining the feasibility of a design prior to synthesis. A
design is called feasible if it achieves the desired functionality and
satisfies the given environmental constraints. Because timing is an important
aspect of a correct design, protocols are described using timed signal
transition graphs, an interpreted Petri net. It is shown here that the
feasibility of designs whose corresponding behavior is periodic can be studied
using a technique called timing analysis for synthesis.
A-3B.3 A Hardware-Software Co-simulator for Embedded System Design and Debugging
A. Ghosh, M. Bershteyn, R. Casley, C. Chien, A. Jain, M. Lipsie,
D. Tarrodaychik,
O. Yamamoto (Mitsubishi Electric Research Laboratories, Inc., USA)
One of the interesting problems in hardware-software co-design is that of
debugging embedded software in conjunction with hardware. Currently, most
software designers wait until a working hardware prototype is available before
debugging software. Bugs discovered in hardware during the software debugging
phase require re-design and re-fabrication, thereby not only delaying the
project but also increasing cost. It also puts software debugging on hold until
a new hardware prototype is available. In this paper we describe a
hardware-software co-simulator that can be used in the design, debugging and
verification of embedded systems. This tool contains simulators for different
parts of the system and a backplane which is used to integrate the simulators.
This enables us to simulate hardware, software and their interaction
efficiently. We also address the problem of simulation speed. Currently, the
more accurate (in terms of timing) the models used, the longer it takes to
simulate a system. Our main contribution is a set of techniques to speed up
simulation of processors and peripherals without significant loss in timing
accuracy. Finally, we describe applications used to test the co-simulator and
our experience in using it.
Chair: Chong-Min Kyung, KAIST, Korea
A-3C.1 Integrated Interconnect Circuit Modeling for VLSI Design
Won-Young Jung, Ghun-Up Cha, Young-Bae Kim, Jun-Ho Baek, Choon-Kyung Kim
(LG Semiconductor Inc., Korea)
An integrated interconnect modeling system, SIMS, is developed with the
parametrized modeling of interconnect and the interface with a schematic
capture and editor. SIMS automatically drives numerical interconnect simualtion
as directed by technology engineers, creates polynomial model library for
interconnect parasitics, generates netlist including SPICE model for the
interconnect structure specified by circuit designers, automatically drives
circuit simulations and display the simulation results through advanced GUI.
VLSI design with SIMS makes it possible to consider parasitic effects fast and
accurately, which becomes more important in deep submicron circuit design area.
With this capability, circuit design with optimized interconnect layout can be
easily achieved. Ultimately the integrated system helps to reduce the cost of
technology development and the time-to-market by building up the concept of
design-for-manufacturability.
A-3C.2 Architectural Simulation for a Programmable DSP Chip Set
Jong Tae Lee, Jaemin Kim, and Jae Cheol Son
(Samsung Electronics Co., Ltd., Korea)
This paper presents an architectural simulator called VIDEOFLOW software tools
for developing programmable DSP chip set for various video codec standards.
This DSP chip set consists of the Image Compression Coprocessor (ICC) and
Motion Estimation Coprocessor (MEC), which provide an easy solution for
implementing the major digital video codec algorithms. The ICC/MEC simulation
components are 100 percent bit accurate and closely approximate the timing of
the actual chips. In addition, the simulation tool provides users with the
ICC/MEC system simulation, debugging, and various performance monitors. This
tool can also be used to define and modify the architectural specification for
future product line of the ICC and MEC.
A-3C.3 System-Level Verification of CDMA Modem ASIC
GyeongLyong Park, KyungHi Chang, JaeSeok Kim, and KyungSoo Kim
(ETRI, Korea)
In this paper, we present a system-level verification methodology which is used
to verify the design of CDMA (Code Division Multiple Access) modem ASIC. To
make the system-level verification feasible, the models for modulator of base
station, fading channel and AGC loop were developed under the C environment.
Behavioral modeling of a microcontroller was also carried out using VHDL to
provide the ASIC with realistic input data, and the netlist of CDMA modem
ASIC is loaded on to a hardware accelerator, which is interfaced with VHDL
simulator. Finally, simulation was performed by executing an actual CDMA call
processing software. THis method was proved to be effective in both discovering
in advance malfunctions of ASIC when embedded in the system and reducing simulation
time by a factor of as much as 20 in the case of gate-level simulation. The
designed ASIC which consists of 90,000 gates and 29K SRAMs is now successfully
working in the real mobile-station on its first fab-out.
A-3C.4 A Digital Audio Signal Processor for Cellular Phone Application
Jeongsik Yang, Chanhong Park, Beomsup Kim (KAIST, Korea)
A salient digital audio signal processor for mobile communication receiver is
described in this paper. With this IC, the complete audio signal processing
system of an AMPS or a TACS cellular phone is easily implemented. The processor
can be also applied to cellular radio, high performance cordless telephone, etc.
In this paper, as an example of application, the implementation of AMPS audio
signal processing system is presented. To save the power consumption, some
special consideration of the low power architectures has been made. The DSP
core uses 4 stage pipelining without a routine memory access stage to reduce
the power consumption and execution speed. The parallel calculations of data
in the DSP and separated filter blocks also contribute to reduce the clock
frequency and to save power. It also provides the power-down mode that turns
off the IC except the internal bus interface in the standby mode. It uses 3.3V
power supply, 10MHz 4-phase clock. The data sampling rate is 10K-samples per
second.
Chair: Jason Cong (Univ. of California, Los Angeles, USA)
Co-Chair: Takashi Mitsuhashi (Toshiba Corp., Japan)
A-4A.1 Region Definition and Ordering Assignment with the Minimization of
the Number of Switchboxes
Jin-Tai Yan (National Chiao Tung Univ., Taiwan)
In this paper, a region definition and ordering assignment (RDAOA) algorithm on
minimizing the number of switchboxes is proposed. The time complexity of the
algorithm is proved to be in O(n) time, where n is the number of line segments
in a given floorplan graph. Finally, several examples have been tested on the
proposed algorithm and other published algorithms, and the experimental results
show that our algorithm defines fewer switchboxes than other algorithms.
A-4A.2 A Three-Layer Over-the-Cell Multi-Channel Routing Method for a New
Cell Model
Masahiro Tsuchiya, Tetsushi Koide, Shin'ichi Wakabayashi (Hiroshima Univ., Japan),
Noriyoshi Yoshida (Hiroshima City Univ., Japan)
In this paper, we present a new cell model for over-the-cell routing and a
new over-the-cell multi-channel routing method. In the proposed new cell model,
terminals can be placed arbitrarily on the second layer of a cell so that each
cell doesn'’t require the extra routing region on the first layer of a cell to
align terminals. Unlike conventional cell models, some parts of the second layer
are also utilized for the intra-cell routing in order to reduce the chip area.
Therefore the size of the proposed cell model can be smaller than that of
conventional cell model. The proposed method consists of three phases. In order
to utilize the proposed cell model, in phase 1, we simultaneously handle all
channels to determine the most effective routing patterns from the set of
possible routing patterns to minimize the chip area. In phase 2, for the
effective routing patterns of nets selected in phase 1, over-the-cell routing
nets are selected by a new greedy algorithm considering obstacles on
over-the-cells. Finally, the conventional channel routing algorithm is applied
for nets unrouted on over-the-cell. From the experimental results with MCNC
benchmarks, the proposed cell model and algorithm produce smaller height of
layouts as compared to those produced by conventional cell models and
algorithms, and the effectiveness of the proposed method and cell model was
shown.
A-4A.3 Pin Assignment and Routing on a Single-Layer Pin Grid Array
Man-Fai Yu, Wayne Wei-Ming Dai (Univ. of California, Santa Cruz, USA)
PGA routing has the freedom of routing any pin to any pad. We proposed an
algorithm (EVENPGA) that generates a monotonic topological routing. The routing
has no detours and is uniformly distributed optimally. The wire length is also
the shortest possible under the taxicab wiring metric. If the topological
routing is routable, the maximum density of critical cuts along a ring is the
minimum possible. Once the topological routing is done, physical layout can
easily be obtained using Surf, a rubberband-based routing system.
Chair: Yervant Zorian (AT&T, USA)
Co-Chair: Kiyoshi Furuya (Chuo Univ., Japan)
A-4B.1 Design for Testability Using Register-Transfer Level Partial Scan
Selection
Akira Motohara, Sadami Takeoka, Toshinori Hosokawa, Mitsuyasu Ohta,
Yuji Takai, Michihiro Matsumoto,
Michiaki Muraoka (Matsushita Electric Industrial Co., Ltd., Japan)
An approach to top down design for testability using register-transfer
level(RTL) partial scan selection is described. We propose a scan selection
technique based on testability analysis for RTL design including data path
circuits and control circuits such as state machines. Registers and state
machines which make gate level ATPG difficult are identified by the scan
selection technique based on RTL testability analysis effectively. Experimental
results for actual circuits are also presented.
A-4B.2 A Built-In Self Test Scheme for VLSI
T. Raju Damarla (GEO Centers, Inc, USA),
Wei Su, Gerald T. Michael (US Army Research Labs, USA),
Moon J. Chung (Michigan State Univ., USA),
Charles E. Stroud (Univ. of Kentucky, USA)
In this paper we present a novel approach for Built-In Self Test (BIST) for
VLSI. Many conventional BIST schemes use signatures generated by a linear
feedback shift register (LFSR) or a multiple input signature register (MISR)
for determining whether the device under test is faulty or fault free. In the
approach presented in this paper, fault detection is made based on the number
of different states the LFSR visits. This number is called the cycle length.
It is also shown that such an approach results in the probability of aliasing
of 2 ^-(2^(m-1) + m), where m denotes the number of registers in the LFSR,
compared to 2 ^-m achieved by conventional signature analyzers. We also present
the complexity of the additional hardware required to implement the scheme.
A-4B.3 BIST with Negligible Aliasing through Random Cover Circuits
T. Bogue, H. Jürgensen (Univ. of Western Ontario, Canada),
M. Gössel (Univ. of Potsdam, Germany)
In this paper, a new modified BIST structure is investigated. The output of the
MISA is monitored during the test by an error detection circuit which is
composed of two simple cover circuits. To simplify the cover construction, the
cover circuits are randomly chosen to be active for some of the outputs of the
MISA. Thus, a time-consuming fault simulation can be completely avoided. The
overhead for the cover circuits is determined for several of the ISCAS'85 and
Berkeley benchmark circuits. These simulation experiments show that a
significant reduction of the aliasing probability can be achieved, confirming
and far surpassing theoretically predicted improvements. Moreover, this
improvement can be achieved at a nearly negligible cost in additional hardware.
Chair: Sunil D. Sherlekar (Indian Inst. of Technology, India)
Co-Chair: Ryuichi Takahashi (Hiroshima City Univ., Japan)
A-4C.1 Implicit Prime Compatible Generation for Minimizing Incompletely
Specified Finite State Machines
Hiroyuki Higuchi, Yusuke Matsunaga (Fujitsu Laboratories Ltd., Japan)
This paper proposes a new implicit algorithm for excluding dominated
compatibles. The algorithm utilizes a novel notion of signatures of compatibles
to exclude dominated compatibles efficiently. Though this dominance check is
weaker than the conventional one, experimental results show that in many cases
the number of excluded compatibles is the same as that by class sets. Proposed
method computes prime compatibles more efficiently than conventional methods
for many tested large ISFSM's.
A-4C.2 Logic Optimization by an Improved Sequential Redundancy Addition and
Removal Technique
Uwe Gläser (Schloss Birlinghoven, Germany),
Kwang-Ting Cheng (Univ. of California, Santa Barbara, USA)
Logic optimization methods using Automatic Test Pattern Generation (ATPG)
techniques such as redundancy addition and removal have recently been proposed.
In this paper we generalize this approach for synchronous sequential circuits.
We proposed several new sequential transformations which can be efficiently
identified and used for optimizing large designs. One of the new transformations
involves adding redundancies across time frames in a sequential circuit. We also
suggest a new transformation which involves adding redundancies to block
initialization of other wires. We use efficient sequential ATPG techniques to
identify more sequential redundancies for either addition or removal. We have
implemented a sequential logic optimization system based upon this approach.
We show experimental results to demonstrate that this approach is both
CPU-time efficient and memory efficient and can optimize large sequential
designs significantly.
A-4C.3 On Hazard-Free Implementation of Speed-Independent Circuits
Alex Kondratyev, Michael Kishinevsky (Univ. of Aizu, Japan),
Alex Yakovlev (Univ. of Newcastle upon Tyne, UK)
We investigate the problem of hazard-free gate-level implementation of
speed-independent circuits specified by event-based models, such as Signal
Transition Graph (for processes with AND causality and input choice) or its
extension, called Change Diagram (which allows OR-causality). The main result
of the paper is twofold: (1) the proof that any speed-independent behavior can
be implemented at the gate-level without hazards, and (2) an efficient method
for such an implementation. This method is based on transformations of the
specification to the form satisfying the Monotonous Cover requirement. Since
this method is based on standard gate cells it can be used both in the
full-custom and semi-custom VLSI design. Experimental results demonstrate area
and performance efficiency of our method.
Chair: Hon-Wai Leong (National Univ. of Singapore)
Co-Chair: Shin'ichi Wakabayashi (Hiroshima Univ.)
A-5A.1 Extending Pitchmatching Algorithms to Layouts with Multiple Grid
Constraints
Hiroshi Miyashita (NTT, Japan)
Pitchmatching algorithms are widely used in layout environments where no grid
constraints are imposed. However, realistic layouts include multiple grid
constraints which facilitate the applications of automatic routing. Hence,
pitchmatching algorithms should be extended to those realistic layouts. This
paper formulates a pitchmatching problem with multiple grid constraints. An
algorithm for solving this problem is constructed by extending conventional
pitch-matching algorithms. The computational complexity is also discussed in
comparison with a conventional naive algorithm. Finally, examples and
application results to realistic layouts are presented.
A-5A.2 A New Layout Synthesis for Leaf Cell Design
Masahiro Fukui, Noriko Shinomiya, Toshiro Akino (Matsushita Electric Industrial Co., Ltd., Japan)
We propose a new layout synthesis with 2-dimensional transistor arrangement and
a spontaneous process of 2-dimensional compaction and local re-routing. The
compaction enables jumping over objects, minimizing the number of contacts for
wiring. We applied the layout synthesis to actual cell design and obtained
comparable results to hand-crafted design.
A-5A.3 A Layout Approach to Monolithic Microwave IC
Akira Nagao, Chiyoshi Yoshioka, Takashi Kambe (Sharp Corp., Japan),
Isao Shirakawa (Osaka Univ., Japan)
A layout approach is attempted dedicatedly for MMICs (Monolithic Microwave
Integrated Circuits), on which predominant layout elements are transistors,
resistors, capacitors, inductors, coplanar-waveguides, T-junctions, etc.,
formed by the GaAs fabrication process. The layout issue typical of such MMICs
consists essentially in how to realize a single layer placement of different
shapes of layout elements under a variety of spacing, orientating, and shaping
constraints.
In this paper, each layout element is modeled to simplify
placement tasks subject to different placement constraints, and then a set of
the interconnection requirements among elements is represented by a graph, to
which a planarization algorithm is effectively applied. As the result of this
planarization, a placement procedure is constructed mainly by repeated
application of a merging scheme. A number of experimental results are also
shown to demonstrate the practicability of the described layout approach.
A-5A.4 Performance Driven Multiple-Source Bus Synthesis Using Buffer Insertion
Chia-Chun Tsai, De-Yu Kao, Chung-Kuan Cheng, Ting-Ting Lin (Univ. of California, San Diego, USA)
A heuristic algorithm for a given topology of a multiple-source and
multiple-sink bus to reduce the signal delay time is proposed. The algorithm
minimizes the delay by inserting buffers into the candidate locations and sizing
the buffers. Experiments show up to 7.2%, 20.7%, and 29.6% improvement in delay
for 2.0, 0.5, and 0.3 micron technologies, respectively.
Chair: Dr. Bernd Becker (Johann Wolfgang Goethe-Univ., Germany)
Co-Chair: Tetsuya Fujimoto (Sharp Corp., Japan)
A-5B.1 Communication Based FPGA Synthesis for Multi-Output Boolean Functions
Christoph Scholl (Univ. des Saarlandes, Germany), Paul Molitor
(Martin-Luther Univ. Halle, Germany)
One of the crucial problems multi-level logic synthesis techniques for
multi-output boolean functions f = {f1,...,fm} : {0,1}^n -> {0,1}^m have to deal
with is finding sublogic which can be shared by different outputs, i.e., finding
boolean functions a = {a1,...,ah} : {0,1}^p -> {0,1}^h which
can be used as common sublogic of good realizations of f1,...,fm.
In this paper we present an efficient ROBDD based implementation of this Common Decomposition
Functions Problem (CDF).
Formally, CDF is defined as follows: Given m boolean functions f1,...,fm :
{0,1}^n -> {0,1}, and two natural numbers p and h, find h boolean functions
a1,...,ah : {0,1}^p -> {0,1} such that forall 1 <= k <= m there is
a decomposition of fk of the form
fk{x1,...,xn} = g^(k){a1{x1,...xp},...,ah{x1,...,xp},
a(h+1)^(k){x1,...,xp},...,a(rk) ^(k){x1,...xp},x{(p+1),...,xn}
using a minimal number rk of single-output boolean decomposition functions.
Experimental results applying the method to FPGA synthesis are promising.
A-5B.2 Optimum PLA Folding through Boolean Satisfiability
Jose M. Quintana, Maria J. Avedillo, Maria P. Parra, Jose L. Huertas
(Univ. of Sevilla, Spain)
This paper proposes an algorithm for optimum PLA folding based on its
formulation as a problem of boolean satisfiability. A logical expression is
derived such that the assignment of variables that satisfies it defines a
folding with a minimum number of columns. The proposed algorithm uses BDDs to
represent boolean functions and incorporates novel reduction techniques,
obtaining satisfactory results.
A-5B.3 Technology Mapping for FPGAs with Complex Block Architectures by Fuzzy Logic Technique
Jun-Yong Lee, Eugene Shragowitz (Univ. of Minnesota, USA)
This paper describes a technology mapper for FPGAs with the complex structure
of logic blocks. Most technology mappers developed so far are not effective for
such complex logic block architectures as XILINX XC4000 series. The proposed
mapper applies a constructive mapping algorithm and fuzzy logic rules to
balance such criteria as area, timing, routability and others. Performance of
the mapper is demonstrated on the set of MCNC benchmarks.
A-5B.4 Logic Rectification and Synthesis for Engineering Change
Chih-Chang Lin, David Ihsin Cheng, Malgorzata Marek-Sadowska
(Univ. of California, Santa Barbara, USA),
Kuang-Chien Chen (Fujitsu Labs. of America, Inc., USA)
In the process of VLSI design, specifications are often changed. It is
desirable that such changes will not lead to a very different design, so that
a large part of engineering effort can be preserved. We treat this problem as
a combination of multiple-error diagnosis and logic minimization problems.
Given a new specification and an existing synthesized logic network, our
algorithms modify the existing network minimally such that the new specification
can be realized. In this paper, a new algorithm is developed to identify
multiple candidate signals simultaneously from the existing network, such that
appropriate modifications of these signals can rectify the specification change.
Session [A-5C]
PANEL: Future Direction of Synthesizability and Interoperability of HDL's: Part-1
Chair: Masaharu Imai (Toyohashi Univ. of Technology, Japan)
Co-Chair: Eugenio Villar (Univ. of Cantabria, Spain)
Panelists: Raul Camposano (Synopsys, USA),
Andrew Guyler (Mentor Graphics, USA),
Victor Berman (Cadence Design Systems, USA),
Jeffrey Fox (Viewlogic, USA),
Sunil D. Sherlekar (IIT Bombay, India),
Shigeaki Hakusui (Harmonix, USA)
Chair: Gabriele Saucier (INPG, France)
Co-Chair: Akihiro Tsutsui (NTT, Japan)
A-6A.1 A New K-Way Partitioning Approach for Multiple Types of FPGAs
Bernhard M. Riess, Heiko A. Giselbrecht, Bernd Wurth
(Technical Univ. of Munich, Germany)
This paper considers the problem of partitioning a large, technology mapped
circuit onto multiple FPGA devices of a specified device library. We propose
an iterative three-step approach applying an analytical embedding technique,
initial partitioning, and a k-way ratio cut improvement procedure. We
successfully partitioned the ACM/SIGDA XILINX FPGA Benchmark circuits obtaining
feasible design solutions with lower total dollar costs than previous methods.
Moreover, our approach simultaneously assigns the FPGAs to physical locations
on the FPGA board.
A-6A.2 Maple-opt: A Simultaneous Technology Mapping, Placement, and Global
Routing Algorithm for FPGAs with Performance Optimization
Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki (Waseda Univ., Japan)
In this paper, we propose a new FPGA design algorithm, Maple-opt, in which
technology mapping, placement, and global routing are executed so that the
delay of each critical signal path in an input circuit is within a specified
upper bound imposed on it. The basic algorithm of Maple-opt is top-down
hierarchical bi-partitioning of regions. Technology mapping onto
logic-blocks of FPGAs, their placement, and global routing are determined
simultaneously in each hierarchical process. This simultaneity leads to less
congested layout for routing. In addition to that, Maple-opt computes a lower
bound of delay for each path with a constraint value and determines critical
paths based on the difference between the lower bound and the constraint value
dynamically in each hierarchical process. Two delay reduction processes are
executed for the critical paths; one is routing delay reduction and the other
is logic-block delay reduction. Routing delay reduction is realized such that,
when bi-partitioning a region, each constrained path is assigned to one
subregion. Logic-block delay reduction is realized such that each constrained
path is mapped onto fewer logic-blocks. Experimental results for some benchmark
circuits show its efficiency and effectiveness.
A-6A.3 Routing on Regular Segmented 2-D FPGAs
Yu-Liang Wu (Cadence Design Systems, Inc., USA),
Malgorzata Marek-Sadowska (Univ. of California, Santa Barbara, USA)
In this paper we analyze the properties of the Xilinx-like regular segmentation
schemes for 2-D Field Programmable Gate Arrays (FPGAs). We introduce a new
notion of architectural level routing decaying effect caused by wiring
segmentation. We discuss its routing properties and propose a relative
prime number based segmentation scheme for 2-D FPGA architectures. A new FPGA
design concept of applying architectural coupling to achieve better routability
is also introduced and experimentally justified.
Chair: Tsutomu Sasao (Kyshu Inst. Technology, Japan)
Co-Chair: Shin-ichi Minato (NTT, Japan)
A-6B.1 Flexible Optimization of Fixed Polarity Reed-Muller Expansions for
Multiple Output Completely and Incompletely Specified Boolean Functions
Chip-Hong Chang, Bogdan J. Falkowski (Nanyang Technological Univ., Singapore)
A new algorithm that generates optimal fixed polarity Reed-Muller expansions
based on user specified optimization criteria is shown. The algorithm accepts
reduced representation of Boolean functions in form of an array of cubes
and operates on an Algebraic Ternary Decision Tree together with lookup
tables of flexible sizes. Allocation of don't care minterms is performed
in an non exhaustive way by a heuristic approach based on the properties
of Reed-Muller expansions.
A-6B.2 GRMIN: A Heuristic Simplification Algorithm for Generalized Reed-Muller Expressions
Debatosh Debnath, Tsutomu Sasao (Kyushu Inst. of Technology, Japan)
A generalized Reed-Muller expression (GRM) is a type of AND-EXOR expressions.
In a GRM, each variable may appear both complemented and uncomplemented.
Networks realized using GRMs are easily tested. This paper presents GRMIN, a
heuristic simplification algorithm for GRMs of multiple-output functions.
GRMIN uses eight rules. As the primary objective, it reduces the number of
products, and as the secondary objective, it reduces the number of literals.
Experimental results show that, in most cases, GRMs require fewer products than
conventional sum-of-products expressions (SOPs). GRMIN outperforms existing
algorithms.
A-6B.3S Learning Heuristics by Genetic Algorithms
Rolf Drechsler, Bernd Becker (Johann Wolfgang Goethe-Univ., Germany)
In many applications of Computer Aided Design (CAD) of Integrated Circuits (ICs)
the problems that have to be solved are NP-hard. Thus, exact algorithms are
only applicable to small problem instances and many authors have presented
heuristics to obtain solutions (non-optimal in general) for larger instances
of these hard problems. In this paper we present a model for Genetic Algorithms
(GA) to learn heuristics starting from a given set of basic operations. The
difference to other previous applications of GAs in CAD of ICs is that the GA
does not solve the problem directly. Rather, it develops strategies for
solving the problem. To demonstrate the efficiency of our approach experimental
results for a specific problem are presented.
A-6B.4S Optimization Methods for Lookup-Table-Based FPGAs Using Transduction Method
Shigeru Yamashita, Yahiko Kambayashi (Kyoto Univ., Japan),
Saburo Muroga (Univ. of Illinois, USA)
In recent years Field Programmable Gate Arrays(FPGAs) have emerged as an
attractive means to implement low volume applications and prototypes due to
their low cost, reprogrammability and rapid turnaround times. Therefore, the
need for design methods of FPGAs are getting larger and larger. In this paper,
two methods to optimize networks which have been mapped for lookup-table-based
FPGAs are discussed. These methods utilize the notion of compatible sets of
permissible functions(CSPFs) of Transduction Method. Experimental results show
the effectiveness of our methods.
Session [A-6C]
PANEL: Future Direction of Synthesizability and Interoperability of HDL's: Part-2
Chair: Eugenio Villar (Univ. of Cantabria, Spain)
Co-Chair: Masaharu Imai (Toyohashi Univ. of Technology, Japan)
Panelists: Raul Camposano (Synopsys, USA),
Andrew Guyler (Mentor Graphics, USA),
Victor Berman (Cadence Design Systems, USA),
Jeffrey Fox (Viewlogic, USA),
Sunil D. Sherlekar (IIT Bombay, India),
Shigeaki Hakusui (Harmonix, USA)
Chair: David Skellern (Macquarie Univ., Australia)
Co-Chair: Tetsuro Kage (Fujitsu Laboratories, Japan)
A-7A.1 A New and Accurate Interconnection Delay Time Evaluation in a General
Tree-Type Network
D. Deschacht, C. Dabrin (Univ. Montpellier II, France)
In all recent technologies the delay caused by interconnection wires is
essential in the evaluation of the switching speed of integrated structures.
Completely wrong results, would result if this were neglected. By considering
a distributed RC network to model the interconnection lines, we proposed a new
analytical delay time expression for a general tree type network, with full
incorporation of technology design parameters. A computationally simple
technique is presented and comparisons with HSPICE simulation results show the
accuracy of the developed model in timing verification.
A-7A.2 An Efficient Logic/Circuit Mixed-Mode Simulator for Analysis of Power
Supply Voltage Fluctuation
Mikako Miyama, Goichi Yokomizo, Masato Iwabuchi, Masami Kinoshita (Hitachi, Ltd., Japan)
A mixed-mode simulator is described that can simulate voltage fluctuations in
the power supply network. Current flow due to logic events is taken into account
in order to predict the voltage fluctuations. The difference between the maximum
voltage fluctuations calculated by the proposed mixed-mode simulation and these
calculated by conventional circuit simulation are within 20%, and we demonstrate
the feasibility of the proposed simulation by simulating an entire MOS memory
chip (36,000 transistors) in 75 minutes on an HP9000/735.
A-7A.3 A Model-Adaptable MOSFET Parameter Extraction System
Masaki Kondo, Hidetoshi Onodera, Keikichi Tamaru (Kyoto Univ., Japan)
A model-adaptable parameter extraction system is developed to catch up with
rapid development of new advanced MOSFET models. The model-adaptability relies
on two techniques; a model-adaptable initial value estimation method and a
design environment that stores and reuses extraction procedures. The system
makes it easy to develop an extraction procedure for a new MOSFET model through
the reuse of an existing procedure for a previous model. We have presently
verified that the system can accommodate major SPICE models including Level2-3
and BSIM1-3.
Chair: Tomoyuki Fujita (NEC, Japan)
Co-Chair: Yusuke Matsunaga (Fujitsu Laboratories, Japan)
A-7B.1 Improved Computational Methods and Lazy Evaluation of the Ordered
Ternary Decision Diagram
Per Lindgren (Luleä Inst. of Technology, Sweden)
We investigate the properties of the Ordered Ternary Decision Diagram (OTDD) in
order to develop an efficient general OTDD package. The OTDD is a three-branched
three-terminal diagram based on Kleenean strong ternary logic. The OTDD can
represent functions having nontrivial don't-care sets in a single diagram and
is capable of provably correct evaluation in the presence of unknown input values.
We propose a number of improvements to both OTDD computational methods and
data structures. Furthermore we introduce the purged form OTDD which unifies the
abbreviated and full form OTDD into a single diagram. A package exploiting these
OTDD specific properties is presented and we show the computational advantages
of this improved package for LGSynth93 standard benchmarks.
A-7B.2 Some Remarks about Spectral Transform Interpretation of MTBDDs and
EVBDDs
Radomir S. Stankovic (Yugoslavia)
In this paper we give a spectral transform interpretation of AND-EXOR
representations of switching functions and related decision diagrams in
the vector space over GF(2). The consideration is uniformly extended to
the Fourier series-like expressions of functions in the complex vector space
and the decision diagrams for integer-valued functions. It is shown that
the multi-terminal decision diagrams, MTBDDs, and edge-valued functions are
derived by using the same sets of basic functions already applied for the
decision diagrams attached to some AND-EXOR expressions, but considered over
the complex field. The algebraic transform decision diagrams, ATDDs, are
considered as the integer counterparts of the functional decision diagrams,
FDDs, attached to the algebraic transform in the same way as the FDDs are
attached to the Reed-Muller expressions. It is shown that the EVBDDs are
the ATDDs in different notation.
A-7B.3 Manipulation of Regular Expressions Under Length Constraints Using
Zero-Suppressed-BDDs
Shinya Ishihara, Shin-ichi Minato (NTT, Japan)
We present a new technique that broadens the scope of BDD application. It is
a method for manipulating regular expressions that represent sets of sequences
including repetitions of symbols. In general, sequences in the set represented
by a regular expression have an infinite length and this makes representing and
manipulating them difficult. In this paper, we introduce length constraints
into a representation of regular expressions. Under these constraints, our
method can represent and manipulate large-scale sets of sequences of regular
expressions compactly and uniquely and greatly accelerates operations of
regular expressions. As regular expressions can represent behaviors of a finite
state machine, our technique provides a useful analysis method of finite state
machines and can be applied to formal hardware verification techniques.
Session [A-7C]
PANEL: How Sub-Micron Delay and Timing Issues Will Be Solved?
Chair: Hitoshi Yoshizawa (NEC, Japan)
Panelists: Ahsan Bootehsaz, Dennis B. Brophy, Donald R. Cottrell,
Antun Domic, Vassilos Gerousis, Tamotsu Hiwatashi
Chair: Wayne W.-M. Dai (Univ. of California at Santa Cruz, USA)
Co-Chair: Masato Edahiro (NEC, Japan)
A-8A.1 Exploiting Signal Flow and Logic Dependency in Standard Cell Placement
Jason Cong, Dongmin Xu (Univ. of California, Los Angeles, USA)
Most existing placement algorithms consider only connectivity information during
the placement process, and ignore other information available from the higher
levels of design process. In this paper, we exploit the use of signal flow and
logic dependency in standard cell placement by using the maximum fanout-free
cone (MFFC) decomposition technique. We developed a containment tree based
algorithm for splitting large MFFCs into smaller ones to get clusters with
restricted sizes. We also developed a placement algorithm, named MFFC-TW, which
first clusters the circuit based on MFFC decomposition and then feeds the
clustered circuit to the Timberwolf6.0 placement package. Very promising
experimental results were obtained.
A-8A.2 A New Performance Driven Placement Method with the Elmore Delay Model
for Row Based VLSIs
Tetsushi Koide, Mitsuhiro Ono, Shin'ichi Wakabayashi,
Yutaka Nishimaru (Hiroshima Univ., Japan),
Noriyoshi Yoshida (Hiroshima City Univ., Japan)
In this paper, we present a new performance driven placement method based on
path delay constraint approach for large standard cell layout. The proposed
method consists of three phases and uses the Elmore delay model to model
interconnection delay precisely in each phase. In the first phase, initial
placement is performed by an efficient performance driven mincut partitioning
method. Next, an iterative improvement method by nonlinear programming improves
the layout. The improvement is formulated as the problem minimizing the total
wire length subject to critical path delays. Finally, row assignment
considering timing constraint is performed. From the experimental results
comparing with RITUAL[17], the proposed method is much better than RITUAL in
point of the maximal violation ratio, the total wire length, and the cut size,
and is more effective in point of the interconnection delay model and its
extendability.
A-8A.3 A Neural Network Approach to the Placement Problem
Morteza Saheb Zamani, Graham R. Hellestrand (Univ. of New South Wales, Australia)
In this paper, we introduce a new neural network approach to the placement of
gate array designs. The network used is a Kohonen self-organizing map. An
abstract specification of the design is converted to a set of appropriate input
vectors fed to the network at random. At the end of the process, the map shows
a 2-dimensional plane of the design in which the modules with higher
connectivity are placed adjacent to each other, hence minimizing total
connection length in the design. The approach can consider external connections
and is able to place modules in a rectilinear boundary. These features makes
the approach capable of being used in hierarchical floorplanning algorithms.
A-8A.4 Fanout-Tree Restructuring Algorithm for Post-Placement Timing Optimization
T. Aoki, M. Murakata, T. Mitsuhashi, N. Goto (Toshiba Corp., Japan)
This paper proposes a fanout-tree restructuring algorithm for post-placement
timing optimization to meet timing constraints. The proposed algorithm
restructures a fanout-tree by finding a tree in a graph which represents a
multi-terminal net, and inserts buffer cells and resizes cells based on an
accurate interconnection RC delay without degrading routability. The algorithm
has been implemented and applied to a number of lay-out data generated by
timing driven placement. Application results show a 17% reduction in circuit
delay on the average.
Chair: Hiroaki Kunieda (Tokyo Inst. of Technology, Japan)
Co-Chair: Tatsuya Fujii (NTT, Japan)
A-8B.1 Synthesis and Simulation of Digital Demodulator for Infrared Data
Communication
Hiroshi Uno, Toru Chiba (SHARP Corp., Japan), Keiji Kumatani,
Isao Shirakawa (Osaka Univ., Japan)
A high performance design methodology is described for a digital demodulator,
which is intended for the noise immune wireless infrared data communication. In
this methodology, ASK(Amplitude Shift Keying) infrared signals detected by a
photo detector are digitized into two logic-level pulses by an infrared receiver,
and the demodulation of the digitized signals is implemented by a new
architecture. On account of the interference with optical noises from fluorescent
lamps, an ASK receiver is realized by a 1-bit digital demodulator, which is
designed with use of a high level synthesis tool COMPASS so as to implement
an algorithm for removing the noises. A part of experimental results is also
shown to demonstrate the practicability.
A-8B.2 A Design of High-Performance Multiplier for Digital Video Transmission
Keisuke Okada, Shun Morikawa, Isao Shirakawa
(Osaka Univ., Japan), Sumitaka Takeuchi (Mitsubishi Electric Corp., Japan)
A high performance design methodology is described for a multiplier to be used
dedicatedly for digital video transmission. The key factor for such a multiplier
is to operate at the speed of 30-100MHz but with the precision of 8-10 bits,
since it is intended for FIR filtering of digital video data. In terms of
implementing an FIR filter with more than ten taps, the same number of
multipliers are required to be integrated. Moreover, for the preloadability of
coefficients to the filter, each coefficient can be treated as a constant during
the filtering operation. Motivated by these requirements and functionalities,
a novel multiplier architecture is described, which is to be synthesized with
the use of a high level synthesis tool PARTHENON in conjunction with manually
designed macroblocks. Design results of the multiplier are also shown.
A-8B.3 Design Automation for Integrated Continuous-Time Filters
Using Integrators
Kazuyuki Wada, Shigetaka Takagi, Zdzislaw Czarnul, Nobuo Fujii
(Tokyo Inst. of Technology, Japan, Toshiba Corporation)
This paper proposes a design automation for filters using integrators. This is
based on a predistortion without knowledge of a filter topology. The
predistortion requires an integrator having the same structure, the same-value
elements and an electrically controllable unity-gain frequency, and compensates
for the deviation of frequency characteristics due to an excess phase shift of
an integrator. The effectiveness of the proposed method is demonstrated through
SPICE simulations. An algorithm for a filter design automation is also
discussed.
A-8B.4S A Hardware-Oriented Design for Weighted Median Filters
Chun-Te Chen, Liang-Gee Chen, Jue-Hsuan Hsiao (National Taiwan Univ., Taiwan)
In this paper, the design consideration and algorithm mapping for weighted
median filters are presented. To achieve high throughput rate, a special coding
technique and its dedicated architecture with block processing are constructed
to handle multiple filtering inputs and multiple filtering outputs concurrently.
The pipelined cycle in our design is merely as the delay time of 1-bit
carry-save-adder (CSA). Due to this design strategy, the proposed architecture
can support not only weighted median filters but also rank order-based filters
in high-speed applications.
A-8B.5S Techniques for Low Power Realization of FIR Filters
Mahesh Mehendale (Texas Instruments India Ltd., India), S. D.
Sherlekar, G. Venkatesh (Silicon Automation Systems(India) Ltd.)
In this paper we propose techniques for low power realization of FIR filters on
programmable DSPs. We first analyze the FIR implementation to arrive at useful
measures to reduce power and present techniques that exploit these measures. We
then identify limitations of the existing DSP architectures in implementing
these techniques and propose simple architectural extensions to overcome these
limitations. Finally we present experimental results on real FIR filter
examples that show up to 88% reduction in coefficient memory data bus power,
up to 49% reduction in coefficient memory address bus power.
Session [A-8C]
Invited Tutorial: EDIF Version 350/400 and Information Modelling
Chair: Mike Church (Zuken-Redac, UK)
Speaker: Hilary J. Kahn
Chair: Kewal K. Saluja (Univ. of Wisconsin, USA)
Co-Chair: Kazuhiro Iwasaki (Chiba Univ., Japan)
A-9A.1 Delay Abstraction in Combinational Logic Circuits
Noriya Kobayashi (NEC Corp., Japan), Sharad Malik (Princeton Univ., USA)
In this paper we propose a data structure for abstracting the delay information
of a combinatorial circuit. The particular abstraction that we are interested
in is one that preserves the delays between all pairs of inputs and outputs in
the circuit. The proposed graphical data structure is of size proportional to
( m + n ) in best case, where m and n refer to the number of inputs and outputs
of the circuit. In comparison, a delay matrix that stores the maximum delay
between each input/output pair has size proportional to m x n. We present
heuristic algorithms for deriving these concise delay networks. Experimental
results shows that, in practice, we can obtain concise delay network with the
number of edges being a small multiple of ( m + n ).
A-9A.2 Limits of Using Signatures for Permutation Independent
Boolean Comparison
Janett Mohnke, Paul Molitor (Martin-Luther Univ. Halle, Germany),
Sharad Malik (Princeton Univ., USA)
This paper addresses problems that arise while checking the equivalence of two
Boolean functions under arbitrary input permutations. The permutation problem has
several applications in the synthesis and verification of combinational logic:
It arises in the technology mapping stage of logic synthesis and in logic
verification. A popular method to solve it is to compute a signature for each
variable that helps to establish a correspondence between the variables.
Several researchers have suggested a wide range of signatures that have been
used for this purpose. However, for each choice of signature, there remain
variables that cannot be uniquely identified. Our research has shown that, for
a given example, this set of problematic variables tends to be the same
regardless of the choice of signatures. The paper investigates this problem.
A-9A.3 A Tool for Measuring Quality of Test Pattern for LSIs' Functional Design
Takashi Aoki, Tomoji Toriyama, Kenji Ishikawa, Kennosuke Fukami (NTT, Japan)
A prototype tool is developed for measuring the quality of test patterns for
simulation to verify LSI functional designs. The tool is able to count
activated conditional branches and evaluate the branch pass index of test
patterns. The branch pass index indicates the ratio of the number of
conditional branches validated by the pattern to the total number of
conditional branches in a design. We developed the prototype tool for
PARTHENON [1]. The tool prints out branch identification names not examined by
the test pattern. In using the tool for experimental designs, it helped
designers to significantly improve pattern quality if a branch pass index of
100% for LSI verification patterns was not achieved. Only about 30 seconds of
the processing time was required for a 1000 sentence module. Bugs can often be
found in designs with little effort.
Chair: Youn-Long Steve Lin (Tsing Hua Univ., Taiwan)
Co-Chair: Vasily Moshnyaga (Kyoto Univ., Japan)
A-9B.1 Search Space Reduction in High Level Synthesis by Use of an Initial Circuit
Atsushi Masuda, Hiroshi Imai, Jeffery P. Hansen, Masatoshi Sekine(Toshiba Corp., Japan)
Most existing high-level synthesis(HLS) systems attempt to generate a circuit
from a behavioral description `out of the void', using the entire design space
as the search domain. Because of the vastness of the search space, it is
impossible to do more than a coarse grain search, often resulting in inefficient
designs. This approach, ignores the designer's knowledge of the general
structure of the circuit to be synthesized. In this paper, we describe the HLS
system SIDER(Synthesis by Initial Design Extension and Refinement). SIDER
utilizes designer knowledge about the design space in the form of an
initial circuit. By limiting search to the neighborhood of this initial
circuit, much finer grain search can be performed yielding a higher quality
design. The effectiveness of the SIDER approach is shown by HLS of a 300
line C description of 27 instructions from a MC6502 CPU.
A-9B.2 A Datapath Synthesis System for the Reconfigurable Datapath Architecture
Reiner W. Hartenstein, Rainer Kress (Univ. of Kaiserslautern, Germany)
A datapath synthesis system (DPSS) for the reconfigurable datapath architecture
(rDPA) is presented. The DPSS allows automatic mapping of high level descriptions
onto the rDPA without manual interaction. The required algorithms of this
synthesis system are described in detail. Optimization techniques like loop
folding or loop unrolling are sketched. The rDPA is scalable to arbitrarily
large arrays and reconfigurable to be adaptable to the computational problem.
Fine grained parallelism is achieved by using simple reconfigurable processing
elements which are called datapath units (DPUs). The rDPA can be used as a
reconfigurable ALU for bus oriented systems as well as for rapid prototyping of
high speed datapaths.
A-9B.3 Synthesis-for-Testability Using Transformations
Miodrag Potkonjak, Sujit Dey, Rabindra K. Roy (NEC USA, USA)
We address the problem of transforming a behavioral specification so that
synthesis of a testable implementation from the new specification requires
significantly less area and partial scan cost than synthesis from the original
specification. A two-stage objective function, that estimates the area and
testability of the final implementation, and also captures enabling effects of
the transformations, is developed. Optimization is done using a new randomized
branch and bound steepest descent algorithm. Application of the transformation
algorithm on several examples demonstrates significant simultaneous improvement
in both area and testability of the final implementations.
Session [A-9C]
PANEL: Electronic Data Book: Current Status of Standard Representation and
Future Perspective
Organizer: Kinya Tabuchi (Mitsubishi Electric Corp., Japan)
Chair: Kinya Tabuchi (Mitsubishi Electric Corp., Japan)
Panelists: Andy Graham (CFI, USA),
Bob Yencha (Pinnacles Group, National Semiconductor, USA),
Tom Jeffery (Pinnacles Group, Hitachi Micro Systems, USA),
Toshitaka Fukusima (ELISNET, Fujitsu, Japan),
Joe Prang (Aspect Development Inc., USA)
|