SIGDA, Super Compendium, ASPDAC 1995, Abstracts

ABSTRACTS ASP-DAC'95

Sessions
[1A] [1B] [1C] [2A] [2B] [2C] [3A] [3B] [3C] [4A] [4B] [4C] [5A] [5B] [5C] [6A] [6B] [6C] [7A] [7B] [7C] [8A] [8B] [8C] [9A] [9B] [9C]

Session [A-1A] Design Methodologies for Low Power

Chair: Massoud Pedram (Univ. of South California, USA)
Co-Chair: Hidetoshi Onodera (Kyoto Univ., Japan)

A-1A.1 Transistor Reordering Rules for Power Reduction in CMOS Gates
Wen-Zen Shen, Jiing-Yuan Lin (National Chiao Tung Univ., Taiwan), Fong-Wen Wang (Telecommunication Laboratories, MTC, Taiwan)

The goal of transistor reordering for a logic gate is to reduce the propagation delay as well as the charging and discharging of internal capacitances to achieve low power consumption. In this paper, based on the input signal probabilities and transition densities, we propose a set of simple transistor reordering rules for both basic and complex CMOS gates to minimize the transition counts at the internal nodes. The most attractive feature of this approach is that not only the power consumption is reduced efficiently, but also the other performances are not degraded. Experimental results show that this technique typically reduces the power by about 10% in average, but in some cases the improvement is even 35%.

A-1A.2 Power Reduction by Gate Sizing with Path-Oriented Slack Calculation
How-Rern Lin and TingTing Hwang(Tsing Hua Univ., Taiwan)

This paper describes methods for reducing power consumption. We propose using gate sizing technique to reduce power for circuits that have already satisfied the timing constraint. Replacement of gates on noncritical paths with smaller templates is used in reducing the dissipated power of a circuit. We find that not only gates on noncritical paths can be down-sized, but also gates on critical paths can be down-sized.A power reduction algorithm by means of single gate resizing as well as multiple gates resizing will be proposed. In addition, to identify gates to be resized, a path-oriented method in calculating slack time with false path taken into consideration will be also proposed. During the slack time computation, in order to prevent long false path from becoming sensitizable and thus increasing the circuit delay, slack constraint will be set for gates. Results on a set of circuits from MCNC benchmark set demonstrate that our power reduction algorithm can reduce about 10% more power, on the average, than a previously proposed gate sizing algorithm.

Keywords : low power, gate sizing, time slack, sensitizable path.

A-1A.3 Current and Charge Estimation in CMOS Circuits
Sanjay Dhar, Dave J. Gurney (Mentor Graphics Corp., USA)

CMOS circuits have significant amounts of dynamic short-circuit(or through) current. This can be as large as 20% of the total in well-designed circuits, and up to 80% of the total in circuits that have not been designed carefully. This current depends strongly on the relative sizes of the pull-up to pull-down paths. We introduce the dynamic short-circuit ratio to model this parameter. This allows accurate estimate of currents including the dynamic short-circuit current, and also results in improved delay estimation. Accuracy is typically within 10% of circuit-level simulation while operating at the switch-level abstraction.

Session [A-1B] Design Methodology for Processor and Telecommunication Systems

Chair: Winfried Hahn (Univ. of Passau, Germany)
Co-Chair: Yoshio Takamine (Hitachi Ltd., Japan)

A-1B.1 Auriga2: A 4.7 Million-Transistor CISC Microprocessor
J. P. Tual, M. Thill, C. Bernard, H. N. Nguyen, F. Mottini, M. Moreau, P. Vallet (BULL S. A., France)

With the introduction of the high range version of the DPS7000 mainframe family, Bull is providing a processor which integrates the DPS7000 CPU and first level of cache on one VLSI chip containing 4.7M transistors and using a 0.5 �m, 3Mlayers CMOS technology. This enhanced CPU has been designed to provide a high integration, high performance and low cost systems. Up to 24 such processors can be integrated in a single system, enabling performance levels in the range of 850 TPC-A (Oracle) with about 12 000 simultaneously active connections. The design methodology involved massive use of formal verification and symbolic layout techniques, enabling to reach first pass right silicon on several foundries. An architectural overview of the CPU with emphasis on several original aspects of the design aspects (synthesis, verification, symbolic layout) will be discussed in this paper.

A-1B.2 Automatic Design for Bit-Serial MSPA Architecture
Hiroaki Kunieda, Yusong Liao, Dongju Li, Kazuhito Ito, (Tokyo Inst. of Technology, Japan, Saitama University, Saitama)

Memory Sharing Processor Array (MSPA) architecture is effective in both data storage and processor cell utilization efficiency. In this paper, the design methodology for MSPA is extended to synthesize bit-serial datapath. As a synthesis example, we propose a new bit-serial multiplier with smaller number of logic gates than conventional bit-serial multipliers.

A-1B.3 Stoht --- An SDL-to-Hardware Translator
Ivanil S. Bonatti, Renato J. O. Figueiredo (Univ. of Campinas, Brazil)

This article presents a language translator that allows the use of SDL as a front-end, high-level graphical description tool for hardware design. A subset of this language is proposed for hardware design, including a synthesisable model for SDL's signal-based communication. An algorithm to translate this subset to fully synthesisable VHDL is proposed and implemented in a public-domain software package.

A-1B.4 Enhancing a VHDL Based Design Methodology with Application Specific Data Abstraction
Lars Lindqvist (NKT Electronik A/S and Danmarks Tekniske Univ., Denmark)

VHDL has successfully been introduced into the design methodology for VLSI ASICs. This paper describes a high-level data abstraction and supporting tool that enhance this methodology in the telecommunication application domain. A significant performance gain was obtained by introducing the data abstraction outside the VHDL simulator. The enhanced methodology has been used in current ASIC designs with good results.

Session [A-1C] PANEL: Design Automation 2000---Challenges for Gigabit-Era

Moderator: Richard K. Wallace, Editor-in-Chief, EE Times
Panelists: Joseph Costello (CEO, Cadence), Jeffrey Edson (VP, Intergraph), Aart deGeus (CEO, Synopsys), Alan J. Hanover (CEO, Viewlogic), Jinya Katsube (VP, Zuken), Walden C. Rhines (CEO, Mentor)

Session [A-2A] High Level Synthesis (1)

Chair: Daniel D. Gajski (Univ. of California at Irvine, USA)
Co-Chair: Tadatoshi Ishii (Toshiba Corp., Japan)

A-2A.1 A Scheduling Algorithm for Synthesis of Bus-Partitioned Architectures
Vasily G. Moshnyaga, Fumiaki Ohbayashi, Keikichi Tamaru (Kyoto Univ., Japan)

Due to efficient interconnect structure and internal parallelism bus-partitioned architectures are very beneficial for sub-micron chip design. This paper presents a new approach for integrated scheduling and interconnect binding of bus-segmented data-paths. Experiments show that the approach provides better results than existing methods and is quite flexible.

A-2A.2 Reclocking for High-Level Synthesis
Pradip Jha, Nikil Dutt (Univ. of California, Irvine, USA), Sri Parameswaran (Univ. of Queensland, Australia)

In this paper we describe, a powerful post-synthesis approach called reclocking, for performance improvement by minimizing the total execution time. By back annotating the wire delays of designs created by a high level synthesis system, and then finding an optimal clockwidth, we resynthesize the controller to improve performance without altering the datapath. Reclocking is versatile and can be applied not only for wire delay consideration, but also for bit-width migration, library migration and for feature size migration supporting the philosophy of design reuse. Experimental results show that with re-clocking, the performance of the input designs can be improved by as much as 34%.

A-2A.3 Synthesis of False Loop Free Circuits
Shih Hsu Huang, Ta-Yung Liu, Yu-Chin Hsu (Univ. of California, Riverside, USA), Yen-Jen Oyang (National Taiwan Univ., Taiwan)

In behavior synthesis, an improper resource sharing may result in a circuit containing false loops which is non-simulatable or non-timing-analyzable. Previous approaches solve this problem during the datapath allocation phase. To build a false loop free circuit, they may have to allocate additional functional units other than those defined in the resource constraints. In this paper, we present an approach to solve the problem during the scheduling phase. Our scheduling algorithm finds a schedule which guarantees to have a false loop free circuit mapping under the given resource constraints. Experiments show the proposed approach finds false loop free schedule for most of the examples without introducing extra control steps.

A-2A.4 High-Level Synthesis Scheduling and Allocation Using Genetic Algorithms
M. J. M. Heijligers, L. J. M. Cluitmans, J. A. G. Jess (Eindhoven Univ. of Technology, The Netherlands)

In this article a scheduling method is presented which is capable of allocating supplementary resources during scheduling. This makes it very suitable in synthesis strategies based on lower bound estimations techniques. The method is based on genetic algorithms. Special coding techniques and analysis methods are used to improve the runtime and quality of the results. The scheduler can easily be extended to cover other architectural issues and (for example) provides ways to make trade-offs between functional unit allocation and register allocation. Experiments and comparisons show high quality results and fast run times that outperform results produced by other heuristic scheduling methods.

Session [A-2B] Design Abstractions and Environments

Chair: Graham R. Hellestrand (Univ. of New South Wales, Australia)
Co-Chair: Masatoshi Sekine (Toshiba Corp., Japan)

A-2B.1 A Framework for the Analysis and Design of Algorithms for a Class of VLSI-CAD Optimization Problems
C. -J. Shi (Univ. of Iowa, USA), J. A. Brzozowski (Univ. of Waterloo, Canada)

A simple mathematical framework, called cluster-cover, is established for several VLSI optimization problems including logic minimization, constrained encoding, multi-layer topological planar routing, application timing assignment for delay-fault testing, and minimization of monitoring logic for BIST enhancement. Two paradigms, prime covering and greedy peeling, are presented for developing both exact and heuristic algorithms. The paradigms capture generally applicable ingredients from previously developed algorithms for individual applications. This makes it possible to re-use established techniques in new problems, and provide new insights into existing problems. The paradigms are simple enough to be amenable to theoretical analysis. Bounds on the performance of greedy peeling are derived; these bounds are applicable to many published heuristics which previously could be evaluated only by benchmarks.

A-2B.2 Generic Fuzzy Logic CAD Development Tool
Eric Q. Kang, Eugene Shragowitz (Univ. of Minnesota, USA)

This paper describes a generic fuzzy logic CAD development tool and reports on application of it to some important CAD problems. This menu-based tool allows to introduce linguistic variables in a textual and graphic form by clicking a menu. It permits users to define membership functions in an analytical, table or graphical forms. It connects linguistic variables by fuzzy logic operators to create fuzzy logic and generates their graphical representations.

A-2B.3 A Hardware/Software Codesign Method for Pipelined Instruction Set Processor Using Adaptive Database
Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, (Toyohashi Univ. of Technology, Japan), Nobuyuki Hikichi, (Software Research Associates, Inc., Japan)

This paper proposes a new method to design an optimal pipelined instruction set processor using a formal HW/SW codesign methodology. First, a HW/SW partitioning algorithm for selecting an optimal pipelined architecture is introduced briefly. Then, an adaptive database approach is presented that enables to enhance the optimality of the design through very accurate estimation of the performance of a pipelined ASIP in HW/SW partitioning. The experimental results show that the proposed methods are effective and efficient.

A-2B.4 EMPAR: An Interactive Synthesis Environment for Hardware Emulations
Tsing-Gen Lee, Wen-Jong Fang, Allen C.-H. Wu (Tsing Hua Univ., Taiwan)

In this paper, we present EMPAR, an interactive synthesis environment for hardware emulations. EMPAR provides an open-ended design environment for the development of hardware emulators, which is capable of supporting: (1) a variety of EM architectures (2) a variety of EM synthesis algorithms, (3) interactive control by the user, and (4) design quality analysis. An X-window based graphical user interface has been developed to support a variety of interactive design tasks. The key features in EMPAR are : (1) it is open-ended so that arbitrary algorithms and tools can be built on top of it and (2) it is fully interactive and leaves the control to the designer. EMPAR can be used as a design environment for existing hardware emulators as well as a test bed for the evaluation and development of new hardware-emulator architectures and synthesis algorithms.

Session [A-2C] System-Level Design Automation Activities in Korea (1)

Chair: Chong-Min Kyung, KAIST, Korea

A-2C.1 A Scheduling Algorithm for Multiport Memory Minimization in Datapath Synthesis
Hae-Dong Lee, Sun-Young Hwang (Sogang Univ., Korea)

In this paper, we present a new scheduling algorithms that generates area-efficient register transfer level datapaths with multiport memories. The proposed scheduling algorithm assigns an operation to a specific control step such that maximal sharing of functional units can be achieved with minimal number of memory ports, while satisfying given constraints. We propose a measure of multiport memory cost, MAV (Multiple Access Variable) which is defined as a variable accessed at several control steps, and overall memory cost is reduced by equally distributing the MAVs throughout all the control steps. When compared with previous approaches for several benchmarks available from the literature, the proposed algorithm generates the datapaths with less memory modules and interconnection structures by reflecting the memory cost in the scheduling process.

A-2C.2 An Integrated Hardware-Software Cosimulation Environment for Heterogeneous Systems Prototyping
Yongjoo Kim, Kyuseok Kim, Youngsoo Shin, Taekyoon Ahn, Wonyong Sung, Kiyoung Choi, Soonhoi Ha (Seoul National Univ., Korea)

In this paper, we present a hardware-software cosimulation environment for heterogeneous systems. To be an efficient system verification environment for the rapid prototyping of heterogeneous systems, the environment provides interface transparency, simulation acceleration, smooth transition to cosynthesis, and integrated user interface and internal representation. As an experimental example, a heterogeneous system is cosimulated and prototyped successfully, which shows that our environment can be a useful heterogeneous system specification/verification environment for rapid prototyping.

A-2C.3 A CSIC Implementation with POCSAG Decoder and Microcontroller for Paging Applications
J.Y. Lim , G. Kim (PANTECH Co., Ltd., Korea), I.-S. O (Univ. of Inchon, Korea), J.H. Cho, Y.J. Kim, H.Y. Kim (PANTECH Co., Ltd., Korea)

This paper presents a CSIC (Customer Specification Integrated Circuit) implementation, which includes a 512/1200/2400 bps POCSAG decoder, PDI2400 and MC68HC-05 changed by PANTECH. It can receive all the data with the rate of 512/1200/2400 bps of a single clock of 76.8 KHz. It is designed to have maximum 2 own frames for service enhancement. To improve receiver quality, a preamble detection considering frequency tolerance and a SCW (Synchronization Code Word) detection at every 4 bit is suggested. Also we consider an error correction of address and message up to 2 bits. Furthermore, it is possible with proposed PF (Preamble Frequency) error to achieve a battery life increase due to the turn-off of RF circuits when the preamble signal is detected with noises. The chip is designed using VHDL code from PDI2400 micro-architecture level. It is verified with VHDL simulation software of PowerView TM . Its logic diagrams are synthesized with VHDL synthesis software of PowerView TM . Proposed decoder and MC68HC05 CPU of MOTOROLA are integrated with about 88000 transistors by using 1.0um HCMOS process and named MC68HC05PD6. It is proved that the wrong detection numbers of preamble of noises are significantly reduced in the pager system that uses our chip through the real field test. The system receiving performance is improved by 20% of average, compared with other existing systems.

A-2C.4 Performance-Driven Circuit Partitioning for Prototyping by Using Multiple FPGA Chips
Chunghee Kim, Hyunchul Shin (Hanyang Univ., Korea), Young-Uk Yu (Seodu Logic Inc., Korea)

A new performance-driven partitioning algorithm has been developed to implement a large circuit by using multiple FPGA chips. Partitioning for multiple FPGAs has several constraints to satisfy so that each partitioned subcircuit can be implemented in a FPGA chip. To obtain satisfactory results under the constraints , the partitioning is performed in two phases which are the initial partitioning for global optimization and the iterative partitioning improvements for constraint satisfaction. Experimental results using the MCNC benchmark examples show that our partition method produces better results than those of other recent approaches on the average and that performance-driven partitioning is effective in reducing critical time delays.

Session [A-3A] Partition and Floorplan

Chair: Chung-Kuan Cheng (Univ. of California, San Diego, USA)
Co-Chair: Kazuhiro Ueda (Shibaura Inst. of Technology, Japan)

A-3A.1 A New System Partitioning Method under Performance and Physical Constraints for Multi-Chip Modules
Yoshinori Katsura, Tetsushi Koide, Shin'ichi Wakabayashi (Hiroshima Univ., Japan), Noriyoshi Yoshida (Hiroshima City Univ., Japan)

In this paper, we propose a new and the first MCM system partitioning method considering chip-to-chip delays, chip areas and I/O pins constraints. The proposed method consists of two steps, a clustering step considering the three constraints and an iterative improvement step with mathematical programming. In the first step, we apply two clustering algorithms considering the three constraints and reduce the size of the large MCM system partitioning problem so as to get a solution within a practical computation time. Next, it generates an initial partitioning with 0-1 integer linear programming(ILP) so as to minimize the total wire length. Since there may exist constraint violations in the initial solution, in the second step, we formulate the partitioning problem as a LP problem selecting a maximal independent set and improve it until the total number of cuts is not decreased and the three constraints are satisfied. We also showed that the number of the timing constraints can be reduced by deleting redundant timing constraints. Experimental results showed that the proposed method is able to produce partitions satisfying the three constraints and improves the number of cuts by 27% on an average and a 30% in maximum over the conventional method[5] considering only two constrains.

A-3A.2 A Robust Min-Cut Improvement Algorithm based on Dynamic Look-Ahead Weighting
Katsunori Tani (NEC, Japan)

This paper proposes an approach to enhance Fiduccia-Mattheyses' min-cut algorithm. The approach includes two new ideas: Look-ahead Weighting and Dynamic Weighting . The former is based on the concept of VLSI placement method using quadratic programming. The latter is a technique to carry the better behavior of move-and-lock improvement strategy. Experiments on practical circuits with 5K~140K cells show that the proposed approach achieves promising results.

A-3A.3 Timing Influenced General-Cell Genetic Floorplanner
Sadiq M. Sait, Habib Youssef, Shahid K. Tanvir, and Muhammad S. T. Benten (King Fahd Univ. of Petroleum and Minerals, Saudi Arabia)

In this paper we present a timing-influenced floorplanner for general cell IC design. The floorplanner works in two phases. In the first phase we restrict the modules to be rigid and the foorplan to be slicing. The second phase of foorplanner allows modification to the aspect ratios of individual modules to further reduce the area of the overall bounding box. The first phase is implemented using genetic algorithm while in the second phase we adopt a constraint graph based approach. Experimental results are also presented.

Session [A-3B] Embedded System Design

Chair: Akihiko Yamada (Tokyo Metropolitan Univ., Japan)
Co-Chair: Jun Sato (Tsuruoka National College of Technology, Japan)

A-3B.1 Power Analysis of a 32-bit Embedded Microcontroller
Vivek Tiwari (Princeton Univ., USA), Mike Tien-Chien Lee (Fujitsu Labs. of America, USA)

A new approach for power analysis of microprocessors has recently been proposed [1]. The idea is to look at the power consumption in a microprocessor from the point of view of the actual software executing on the processor. The basic component of this approach is a measurement based, instruction-level power analysis technique. The technique allows for the development of an instruction-level power model for the given processor, which can be used to evaluate software in terms of the power consumption, and for exploring the optimization of software for lower power. This paper describes the application of this technique for a comprehensive instruction-level power analysis of a commercial 32-bit RISC-based embedded microcontroller. The salient results of the analysis and the basic instruction-level power model are described. Interesting observations and insights based on the results are also presented. Such an instruction-level power analysis can provide cues as to what optimizations in the micro-architecture design of the processor would lead to the most effective power savings in actual software applications. Wherever the results indicate such optimizations, they have been discussed. Furthermore, ideas for low power software design, as suggested by the results, are described in this paper as well.

A-3B.2 Assessing the Feasibility of Interface Designs before their Implementation
Marco A. Escalante, Nikitas J. Dimopoulos (Univ. of Victoria, Canada)

During the design of microprocessor-based systems, once the system architecture has been decided and the major components (processors, memories, IO devices) have been selected from a component library, it is necessary to design interface logic to integrate the system. Such an interface design can be carried out based on the protocols used by the components. This paper addresses the problem of determining the feasibility of a design prior to synthesis. A design is called feasible if it achieves the desired functionality and satisfies the given environmental constraints. Because timing is an important aspect of a correct design, protocols are described using timed signal transition graphs, an interpreted Petri net. It is shown here that the feasibility of designs whose corresponding behavior is periodic can be studied using a technique called timing analysis for synthesis.

A-3B.3 A Hardware-Software Co-simulator for Embedded System Design and Debugging
A. Ghosh, M. Bershteyn, R. Casley, C. Chien, A. Jain, M. Lipsie, D. Tarrodaychik, O. Yamamoto (Mitsubishi Electric Research Laboratories, Inc., USA)

One of the interesting problems in hardware-software co-design is that of debugging embedded software in conjunction with hardware. Currently, most software designers wait until a working hardware prototype is available before debugging software. Bugs discovered in hardware during the software debugging phase require re-design and re-fabrication, thereby not only delaying the project but also increasing cost. It also puts software debugging on hold until a new hardware prototype is available. In this paper we describe a hardware-software co-simulator that can be used in the design, debugging and verification of embedded systems. This tool contains simulators for different parts of the system and a backplane which is used to integrate the simulators. This enables us to simulate hardware, software and their interaction efficiently. We also address the problem of simulation speed. Currently, the more accurate (in terms of timing) the models used, the longer it takes to simulate a system. Our main contribution is a set of techniques to speed up simulation of processors and peripherals without significant loss in timing accuracy. Finally, we describe applications used to test the co-simulator and our experience in using it.

Session [A-3C] System-Level Design Automation Activities in Korea (2)

Chair: Chong-Min Kyung, KAIST, Korea

A-3C.1 Integrated Interconnect Circuit Modeling for VLSI Design
Won-Young Jung, Ghun-Up Cha, Young-Bae Kim, Jun-Ho Baek, Choon-Kyung Kim (LG Semiconductor Inc., Korea)

An integrated interconnect modeling system, SIMS, is developed with the parametrized modeling of interconnect and the interface with a schematic capture and editor. SIMS automatically drives numerical interconnect simualtion as directed by technology engineers, creates polynomial model library for interconnect parasitics, generates netlist including SPICE model for the interconnect structure specified by circuit designers, automatically drives circuit simulations and display the simulation results through advanced GUI. VLSI design with SIMS makes it possible to consider parasitic effects fast and accurately, which becomes more important in deep submicron circuit design area. With this capability, circuit design with optimized interconnect layout can be easily achieved. Ultimately the integrated system helps to reduce the cost of technology development and the time-to-market by building up the concept of design-for-manufacturability.

A-3C.2 Architectural Simulation for a Programmable DSP Chip Set
Jong Tae Lee, Jaemin Kim, and Jae Cheol Son (Samsung Electronics Co., Ltd., Korea)

This paper presents an architectural simulator called VIDEOFLOW software tools for developing programmable DSP chip set for various video codec standards. This DSP chip set consists of the Image Compression Coprocessor (ICC) and Motion Estimation Coprocessor (MEC), which provide an easy solution for implementing the major digital video codec algorithms. The ICC/MEC simulation components are 100 percent bit accurate and closely approximate the timing of the actual chips. In addition, the simulation tool provides users with the ICC/MEC system simulation, debugging, and various performance monitors. This tool can also be used to define and modify the architectural specification for future product line of the ICC and MEC.

A-3C.3 System-Level Verification of CDMA Modem ASIC
GyeongLyong Park, KyungHi Chang, JaeSeok Kim, and KyungSoo Kim (ETRI, Korea)

In this paper, we present a system-level verification methodology which is used to verify the design of CDMA (Code Division Multiple Access) modem ASIC. To make the system-level verification feasible, the models for modulator of base station, fading channel and AGC loop were developed under the C environment. Behavioral modeling of a microcontroller was also carried out using VHDL to provide the ASIC with realistic input data, and the netlist of CDMA modem ASIC is loaded on to a hardware accelerator, which is interfaced with VHDL simulator. Finally, simulation was performed by executing an actual CDMA call processing software. THis method was proved to be effective in both discovering in advance malfunctions of ASIC when embedded in the system and reducing simulation time by a factor of as much as 20 in the case of gate-level simulation. The designed ASIC which consists of 90,000 gates and 29K SRAMs is now successfully working in the real mobile-station on its first fab-out.

A-3C.4 A Digital Audio Signal Processor for Cellular Phone Application
Jeongsik Yang, Chanhong Park, Beomsup Kim (KAIST, Korea)

A salient digital audio signal processor for mobile communication receiver is described in this paper. With this IC, the complete audio signal processing system of an AMPS or a TACS cellular phone is easily implemented. The processor can be also applied to cellular radio, high performance cordless telephone, etc. In this paper, as an example of application, the implementation of AMPS audio signal processing system is presented. To save the power consumption, some special consideration of the low power architectures has been made. The DSP core uses 4 stage pipelining without a routine memory access stage to reduce the power consumption and execution speed. The parallel calculations of data in the DSP and separated filter blocks also contribute to reduce the clock frequency and to save power. It also provides the power-down mode that turns off the IC except the internal bus interface in the standby mode. It uses 3.3V power supply, 10MHz 4-phase clock. The data sampling rate is 10K-samples per second.

Session [A-4A] Routing

Chair: Jason Cong (Univ. of California, Los Angeles, USA)
Co-Chair: Takashi Mitsuhashi (Toshiba Corp., Japan)

A-4A.1 Region Definition and Ordering Assignment with the Minimization of the Number of Switchboxes
Jin-Tai Yan (National Chiao Tung Univ., Taiwan)

In this paper, a region definition and ordering assignment (RDAOA) algorithm on minimizing the number of switchboxes is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a given floorplan graph. Finally, several examples have been tested on the proposed algorithm and other published algorithms, and the experimental results show that our algorithm defines fewer switchboxes than other algorithms.

A-4A.2 A Three-Layer Over-the-Cell Multi-Channel Routing Method for a New Cell Model
Masahiro Tsuchiya, Tetsushi Koide, Shin'ichi Wakabayashi (Hiroshima Univ., Japan), Noriyoshi Yoshida (Hiroshima City Univ., Japan)

In this paper, we present a new cell model for over-the-cell routing and a new over-the-cell multi-channel routing method. In the proposed new cell model, terminals can be placed arbitrarily on the second layer of a cell so that each cell doesn'�t require the extra routing region on the first layer of a cell to align terminals. Unlike conventional cell models, some parts of the second layer are also utilized for the intra-cell routing in order to reduce the chip area. Therefore the size of the proposed cell model can be smaller than that of conventional cell model. The proposed method consists of three phases. In order to utilize the proposed cell model, in phase 1, we simultaneously handle all channels to determine the most effective routing patterns from the set of possible routing patterns to minimize the chip area. In phase 2, for the effective routing patterns of nets selected in phase 1, over-the-cell routing nets are selected by a new greedy algorithm considering obstacles on over-the-cells. Finally, the conventional channel routing algorithm is applied for nets unrouted on over-the-cell. From the experimental results with MCNC benchmarks, the proposed cell model and algorithm produce smaller height of layouts as compared to those produced by conventional cell models and algorithms, and the effectiveness of the proposed method and cell model was shown.

A-4A.3 Pin Assignment and Routing on a Single-Layer Pin Grid Array
Man-Fai Yu, Wayne Wei-Ming Dai (Univ. of California, Santa Cruz, USA)

PGA routing has the freedom of routing any pin to any pad. We proposed an algorithm (EVENPGA) that generates a monotonic topological routing. The routing has no detours and is uniformly distributed optimally. The wire length is also the shortest possible under the taxicab wiring metric. If the topological routing is routable, the maximum density of critical cuts along a ring is the minimum possible. Once the topological routing is done, physical layout can easily be obtained using Surf, a rubberband-based routing system.

Session [A-4B] Design for Testability

Chair: Yervant Zorian (AT&T, USA)
Co-Chair: Kiyoshi Furuya (Chuo Univ., Japan)

A-4B.1 Design for Testability Using Register-Transfer Level Partial Scan Selection
Akira Motohara, Sadami Takeoka, Toshinori Hosokawa, Mitsuyasu Ohta, Yuji Takai, Michihiro Matsumoto, Michiaki Muraoka (Matsushita Electric Industrial Co., Ltd., Japan)

An approach to top down design for testability using register-transfer level(RTL) partial scan selection is described. We propose a scan selection technique based on testability analysis for RTL design including data path circuits and control circuits such as state machines. Registers and state machines which make gate level ATPG difficult are identified by the scan selection technique based on RTL testability analysis effectively. Experimental results for actual circuits are also presented.

A-4B.2 A Built-In Self Test Scheme for VLSI
T. Raju Damarla (GEO Centers, Inc, USA), Wei Su, Gerald T. Michael (US Army Research Labs, USA), Moon J. Chung (Michigan State Univ., USA), Charles E. Stroud (Univ. of Kentucky, USA)

In this paper we present a novel approach for Built-In Self Test (BIST) for VLSI. Many conventional BIST schemes use signatures generated by a linear feedback shift register (LFSR) or a multiple input signature register (MISR) for determining whether the device under test is faulty or fault free. In the approach presented in this paper, fault detection is made based on the number of different states the LFSR visits. This number is called the cycle length. It is also shown that such an approach results in the probability of aliasing of 2 ^-(2^(m-1) + m), where m denotes the number of registers in the LFSR, compared to 2 ^-m achieved by conventional signature analyzers. We also present the complexity of the additional hardware required to implement the scheme.

A-4B.3 BIST with Negligible Aliasing through Random Cover Circuits
T. Bogue, H. Jürgensen (Univ. of Western Ontario, Canada), M. Gössel (Univ. of Potsdam, Germany)

In this paper, a new modified BIST structure is investigated. The output of the MISA is monitored during the test by an error detection circuit which is composed of two simple cover circuits. To simplify the cover construction, the cover circuits are randomly chosen to be active for some of the outputs of the MISA. Thus, a time-consuming fault simulation can be completely avoided. The overhead for the cover circuits is determined for several of the ISCAS'85 and Berkeley benchmark circuits. These simulation experiments show that a significant reduction of the aliasing probability can be achieved, confirming and far surpassing theoretically predicted improvements. Moreover, this improvement can be achieved at a nearly negligible cost in additional hardware.

Session [A-4C] Logic Synthesis of Sequential Circuits

Chair: Sunil D. Sherlekar (Indian Inst. of Technology, India)
Co-Chair: Ryuichi Takahashi (Hiroshima City Univ., Japan)

A-4C.1 Implicit Prime Compatible Generation for Minimizing Incompletely Specified Finite State Machines
Hiroyuki Higuchi, Yusuke Matsunaga (Fujitsu Laboratories Ltd., Japan)

This paper proposes a new implicit algorithm for excluding dominated compatibles. The algorithm utilizes a novel notion of signatures of compatibles to exclude dominated compatibles efficiently. Though this dominance check is weaker than the conventional one, experimental results show that in many cases the number of excluded compatibles is the same as that by class sets. Proposed method computes prime compatibles more efficiently than conventional methods for many tested large ISFSM's.

A-4C.2 Logic Optimization by an Improved Sequential Redundancy Addition and Removal Technique
Uwe Gläser (Schloss Birlinghoven, Germany), Kwang-Ting Cheng (Univ. of California, Santa Barbara, USA)

Logic optimization methods using Automatic Test Pattern Generation (ATPG) techniques such as redundancy addition and removal have recently been proposed. In this paper we generalize this approach for synchronous sequential circuits. We proposed several new sequential transformations which can be efficiently identified and used for optimizing large designs. One of the new transformations involves adding redundancies across time frames in a sequential circuit. We also suggest a new transformation which involves adding redundancies to block initialization of other wires. We use efficient sequential ATPG techniques to identify more sequential redundancies for either addition or removal. We have implemented a sequential logic optimization system based upon this approach. We show experimental results to demonstrate that this approach is both CPU-time efficient and memory efficient and can optimize large sequential designs significantly.

A-4C.3 On Hazard-Free Implementation of Speed-Independent Circuits
Alex Kondratyev, Michael Kishinevsky (Univ. of Aizu, Japan), Alex Yakovlev (Univ. of Newcastle upon Tyne, UK)

We investigate the problem of hazard-free gate-level implementation of speed-independent circuits specified by event-based models, such as Signal Transition Graph (for processes with AND causality and input choice) or its extension, called Change Diagram (which allows OR-causality). The main result of the paper is twofold: (1) the proof that any speed-independent behavior can be implemented at the gate-level without hazards, and (2) an efficient method for such an implementation. This method is based on transformations of the specification to the form satisfying the Monotonous Cover requirement. Since this method is based on standard gate cells it can be used both in the full-custom and semi-custom VLSI design. Experimental results demonstrate area and performance efficiency of our method.

Session [A-5A] Technology-Driven Physical Synthesis

Chair: Hon-Wai Leong (National Univ. of Singapore)
Co-Chair: Shin'ichi Wakabayashi (Hiroshima Univ.)

A-5A.1 Extending Pitchmatching Algorithms to Layouts with Multiple Grid Constraints
Hiroshi Miyashita (NTT, Japan)

Pitchmatching algorithms are widely used in layout environments where no grid constraints are imposed. However, realistic layouts include multiple grid constraints which facilitate the applications of automatic routing. Hence, pitchmatching algorithms should be extended to those realistic layouts. This paper formulates a pitchmatching problem with multiple grid constraints. An algorithm for solving this problem is constructed by extending conventional pitch-matching algorithms. The computational complexity is also discussed in comparison with a conventional naive algorithm. Finally, examples and application results to realistic layouts are presented.

A-5A.2 A New Layout Synthesis for Leaf Cell Design
Masahiro Fukui, Noriko Shinomiya, Toshiro Akino (Matsushita Electric Industrial Co., Ltd., Japan)

We propose a new layout synthesis with 2-dimensional transistor arrangement and a spontaneous process of 2-dimensional compaction and local re-routing. The compaction enables jumping over objects, minimizing the number of contacts for wiring. We applied the layout synthesis to actual cell design and obtained comparable results to hand-crafted design.

A-5A.3 A Layout Approach to Monolithic Microwave IC
Akira Nagao, Chiyoshi Yoshioka, Takashi Kambe (Sharp Corp., Japan), Isao Shirakawa (Osaka Univ., Japan)

A layout approach is attempted dedicatedly for MMICs (Monolithic Microwave Integrated Circuits), on which predominant layout elements are transistors, resistors, capacitors, inductors, coplanar-waveguides, T-junctions, etc., formed by the GaAs fabrication process. The layout issue typical of such MMICs consists essentially in how to realize a single layer placement of different shapes of layout elements under a variety of spacing, orientating, and shaping constraints. In this paper, each layout element is modeled to simplify placement tasks subject to different placement constraints, and then a set of the interconnection requirements among elements is represented by a graph, to which a planarization algorithm is effectively applied. As the result of this planarization, a placement procedure is constructed mainly by repeated application of a merging scheme. A number of experimental results are also shown to demonstrate the practicability of the described layout approach.

A-5A.4 Performance Driven Multiple-Source Bus Synthesis Using Buffer Insertion
Chia-Chun Tsai, De-Yu Kao, Chung-Kuan Cheng, Ting-Ting Lin (Univ. of California, San Diego, USA)

A heuristic algorithm for a given topology of a multiple-source and multiple-sink bus to reduce the signal delay time is proposed. The algorithm minimizes the delay by inserting buffers into the candidate locations and sizing the buffers. Experiments show up to 7.2%, 20.7%, and 29.6% improvement in delay for 2.0, 0.5, and 0.3 micron technologies, respectively.

Session [A-5B] Logic Synthesis and Optimization (1)

Chair: Dr. Bernd Becker (Johann Wolfgang Goethe-Univ., Germany)
Co-Chair: Tetsuya Fujimoto (Sharp Corp., Japan)

A-5B.1 Communication Based FPGA Synthesis for Multi-Output Boolean Functions
Christoph Scholl (Univ. des Saarlandes, Germany), Paul Molitor (Martin-Luther Univ. Halle, Germany)

One of the crucial problems multi-level logic synthesis techniques for multi-output boolean functions f = {f1,...,fm} : {0,1}^n -> {0,1}^m have to deal with is finding sublogic which can be shared by different outputs, i.e., finding boolean functions a = {a1,...,ah} : {0,1}^p -> {0,1}^h which can be used as common sublogic of good realizations of f1,...,fm. In this paper we present an efficient ROBDD based implementation of this Common Decomposition Functions Problem (CDF). Formally, CDF is defined as follows: Given m boolean functions f1,...,fm : {0,1}^n -> {0,1}, and two natural numbers p and h, find h boolean functions a1,...,ah : {0,1}^p -> {0,1} such that forall 1 <= k <= m there is a decomposition of fk of the form fk{x1,...,xn} = g^(k){a1{x1,...xp},...,ah{x1,...,xp}, a(h+1)^(k){x1,...,xp},...,a(rk) ^(k){x1,...xp},x{(p+1),...,xn} using a minimal number rk of single-output boolean decomposition functions. Experimental results applying the method to FPGA synthesis are promising.

A-5B.2 Optimum PLA Folding through Boolean Satisfiability
Jose M. Quintana, Maria J. Avedillo, Maria P. Parra, Jose L. Huertas (Univ. of Sevilla, Spain)

This paper proposes an algorithm for optimum PLA folding based on its formulation as a problem of boolean satisfiability. A logical expression is derived such that the assignment of variables that satisfies it defines a folding with a minimum number of columns. The proposed algorithm uses BDDs to represent boolean functions and incorporates novel reduction techniques, obtaining satisfactory results.

A-5B.3 Technology Mapping for FPGAs with Complex Block Architectures by Fuzzy Logic Technique
Jun-Yong Lee, Eugene Shragowitz (Univ. of Minnesota, USA)

This paper describes a technology mapper for FPGAs with the complex structure of logic blocks. Most technology mappers developed so far are not effective for such complex logic block architectures as XILINX XC4000 series. The proposed mapper applies a constructive mapping algorithm and fuzzy logic rules to balance such criteria as area, timing, routability and others. Performance of the mapper is demonstrated on the set of MCNC benchmarks.

A-5B.4 Logic Rectification and Synthesis for Engineering Change
Chih-Chang Lin, David Ihsin Cheng, Malgorzata Marek-Sadowska (Univ. of California, Santa Barbara, USA), Kuang-Chien Chen (Fujitsu Labs. of America, Inc., USA)

In the process of VLSI design, specifications are often changed. It is desirable that such changes will not lead to a very different design, so that a large part of engineering effort can be preserved. We treat this problem as a combination of multiple-error diagnosis and logic minimization problems. Given a new specification and an existing synthesized logic network, our algorithms modify the existing network minimally such that the new specification can be realized. In this paper, a new algorithm is developed to identify multiple candidate signals simultaneously from the existing network, such that appropriate modifications of these signals can rectify the specification change.

Session [A-5C] PANEL: Future Direction of Synthesizability and Interoperability of HDL's: Part-1

Chair: Masaharu Imai (Toyohashi Univ. of Technology, Japan)
Co-Chair: Eugenio Villar (Univ. of Cantabria, Spain)
Panelists: Raul Camposano (Synopsys, USA), Andrew Guyler (Mentor Graphics, USA), Victor Berman (Cadence Design Systems, USA), Jeffrey Fox (Viewlogic, USA), Sunil D. Sherlekar (IIT Bombay, India), Shigeaki Hakusui (Harmonix, USA)

Session [A-6A] CAD Algorithms for FPGAs

Chair: Gabriele Saucier (INPG, France)
Co-Chair: Akihiro Tsutsui (NTT, Japan)

A-6A.1 A New K-Way Partitioning Approach for Multiple Types of FPGAs
Bernhard M. Riess, Heiko A. Giselbrecht, Bernd Wurth (Technical Univ. of Munich, Germany)

This paper considers the problem of partitioning a large, technology mapped circuit onto multiple FPGA devices of a specified device library. We propose an iterative three-step approach applying an analytical embedding technique, initial partitioning, and a k-way ratio cut improvement procedure. We successfully partitioned the ACM/SIGDA XILINX FPGA Benchmark circuits obtaining feasible design solutions with lower total dollar costs than previous methods. Moreover, our approach simultaneously assigns the FPGAs to physical locations on the FPGA board.

A-6A.2 Maple-opt: A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for FPGAs with Performance Optimization
Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki (Waseda Univ., Japan)

In this paper, we propose a new FPGA design algorithm, Maple-opt, in which technology mapping, placement, and global routing are executed so that the delay of each critical signal path in an input circuit is within a specified upper bound imposed on it. The basic algorithm of Maple-opt is top-down hierarchical bi-partitioning of regions. Technology mapping onto logic-blocks of FPGAs, their placement, and global routing are determined simultaneously in each hierarchical process. This simultaneity leads to less congested layout for routing. In addition to that, Maple-opt computes a lower bound of delay for each path with a constraint value and determines critical paths based on the difference between the lower bound and the constraint value dynamically in each hierarchical process. Two delay reduction processes are executed for the critical paths; one is routing delay reduction and the other is logic-block delay reduction. Routing delay reduction is realized such that, when bi-partitioning a region, each constrained path is assigned to one subregion. Logic-block delay reduction is realized such that each constrained path is mapped onto fewer logic-blocks. Experimental results for some benchmark circuits show its efficiency and effectiveness.

A-6A.3 Routing on Regular Segmented 2-D FPGAs
Yu-Liang Wu (Cadence Design Systems, Inc., USA), Malgorzata Marek-Sadowska (Univ. of California, Santa Barbara, USA)

In this paper we analyze the properties of the Xilinx-like regular segmentation schemes for 2-D Field Programmable Gate Arrays (FPGAs). We introduce a new notion of architectural level routing decaying effect caused by wiring segmentation. We discuss its routing properties and propose a relative prime number based segmentation scheme for 2-D FPGA architectures. A new FPGA design concept of applying architectural coupling to achieve better routability is also introduced and experimentally justified.

Session [A-6B] Logic Synthesis and Optimization (2)

Chair: Tsutomu Sasao (Kyshu Inst. Technology, Japan)
Co-Chair: Shin-ichi Minato (NTT, Japan)

A-6B.1 Flexible Optimization of Fixed Polarity Reed-Muller Expansions for Multiple Output Completely and Incompletely Specified Boolean Functions
Chip-Hong Chang, Bogdan J. Falkowski (Nanyang Technological Univ., Singapore)

A new algorithm that generates optimal fixed polarity Reed-Muller expansions based on user specified optimization criteria is shown. The algorithm accepts reduced representation of Boolean functions in form of an array of cubes and operates on an Algebraic Ternary Decision Tree together with lookup tables of flexible sizes. Allocation of don't care minterms is performed in an non exhaustive way by a heuristic approach based on the properties of Reed-Muller expansions.

A-6B.2 GRMIN: A Heuristic Simplification Algorithm for Generalized Reed-Muller Expressions
Debatosh Debnath, Tsutomu Sasao (Kyushu Inst. of Technology, Japan)

A generalized Reed-Muller expression (GRM) is a type of AND-EXOR expressions. In a GRM, each variable may appear both complemented and uncomplemented. Networks realized using GRMs are easily tested. This paper presents GRMIN, a heuristic simplification algorithm for GRMs of multiple-output functions. GRMIN uses eight rules. As the primary objective, it reduces the number of products, and as the secondary objective, it reduces the number of literals. Experimental results show that, in most cases, GRMs require fewer products than conventional sum-of-products expressions (SOPs). GRMIN outperforms existing algorithms.

A-6B.3S Learning Heuristics by Genetic Algorithms
Rolf Drechsler, Bernd Becker (Johann Wolfgang Goethe-Univ., Germany)

In many applications of Computer Aided Design (CAD) of Integrated Circuits (ICs) the problems that have to be solved are NP-hard. Thus, exact algorithms are only applicable to small problem instances and many authors have presented heuristics to obtain solutions (non-optimal in general) for larger instances of these hard problems. In this paper we present a model for Genetic Algorithms (GA) to learn heuristics starting from a given set of basic operations. The difference to other previous applications of GAs in CAD of ICs is that the GA does not solve the problem directly. Rather, it develops strategies for solving the problem. To demonstrate the efficiency of our approach experimental results for a specific problem are presented.

A-6B.4S Optimization Methods for Lookup-Table-Based FPGAs Using Transduction Method
Shigeru Yamashita, Yahiko Kambayashi (Kyoto Univ., Japan), Saburo Muroga (Univ. of Illinois, USA)

In recent years Field Programmable Gate Arrays(FPGAs) have emerged as an attractive means to implement low volume applications and prototypes due to their low cost, reprogrammability and rapid turnaround times. Therefore, the need for design methods of FPGAs are getting larger and larger. In this paper, two methods to optimize networks which have been mapped for lookup-table-based FPGAs are discussed. These methods utilize the notion of compatible sets of permissible functions(CSPFs) of Transduction Method. Experimental results show the effectiveness of our methods.

Session [A-6C] PANEL: Future Direction of Synthesizability and Interoperability of HDL's: Part-2

Chair: Eugenio Villar (Univ. of Cantabria, Spain)
Co-Chair: Masaharu Imai (Toyohashi Univ. of Technology, Japan)
Panelists: Raul Camposano (Synopsys, USA), Andrew Guyler (Mentor Graphics, USA), Victor Berman (Cadence Design Systems, USA), Jeffrey Fox (Viewlogic, USA), Sunil D. Sherlekar (IIT Bombay, India), Shigeaki Hakusui (Harmonix, USA)

Session [A-7A] Modeling and Simulation

Chair: David Skellern (Macquarie Univ., Australia)
Co-Chair: Tetsuro Kage (Fujitsu Laboratories, Japan)

A-7A.1 A New and Accurate Interconnection Delay Time Evaluation in a General Tree-Type Network
D. Deschacht, C. Dabrin (Univ. Montpellier II, France)

In all recent technologies the delay caused by interconnection wires is essential in the evaluation of the switching speed of integrated structures. Completely wrong results, would result if this were neglected. By considering a distributed RC network to model the interconnection lines, we proposed a new analytical delay time expression for a general tree type network, with full incorporation of technology design parameters. A computationally simple technique is presented and comparisons with HSPICE simulation results show the accuracy of the developed model in timing verification.

A-7A.2 An Efficient Logic/Circuit Mixed-Mode Simulator for Analysis of Power Supply Voltage Fluctuation
Mikako Miyama, Goichi Yokomizo, Masato Iwabuchi, Masami Kinoshita (Hitachi, Ltd., Japan)

A mixed-mode simulator is described that can simulate voltage fluctuations in the power supply network. Current flow due to logic events is taken into account in order to predict the voltage fluctuations. The difference between the maximum voltage fluctuations calculated by the proposed mixed-mode simulation and these calculated by conventional circuit simulation are within 20%, and we demonstrate the feasibility of the proposed simulation by simulating an entire MOS memory chip (36,000 transistors) in 75 minutes on an HP9000/735.

A-7A.3 A Model-Adaptable MOSFET Parameter Extraction System
Masaki Kondo, Hidetoshi Onodera, Keikichi Tamaru (Kyoto Univ., Japan)

A model-adaptable parameter extraction system is developed to catch up with rapid development of new advanced MOSFET models. The model-adaptability relies on two techniques; a model-adaptable initial value estimation method and a design environment that stores and reuses extraction procedures. The system makes it easy to develop an extraction procedure for a new MOSFET model through the reuse of an existing procedure for a previous model. We have presently verified that the system can accommodate major SPICE models including Level2-3 and BSIM1-3.

Session [A-7B] Extension of Binary Decision Diagrams

Chair: Tomoyuki Fujita (NEC, Japan)
Co-Chair: Yusuke Matsunaga (Fujitsu Laboratories, Japan)

A-7B.1 Improved Computational Methods and Lazy Evaluation of the Ordered Ternary Decision Diagram
Per Lindgren (Luleä Inst. of Technology, Sweden)

We investigate the properties of the Ordered Ternary Decision Diagram (OTDD) in order to develop an efficient general OTDD package. The OTDD is a three-branched three-terminal diagram based on Kleenean strong ternary logic. The OTDD can represent functions having nontrivial don't-care sets in a single diagram and is capable of provably correct evaluation in the presence of unknown input values. We propose a number of improvements to both OTDD computational methods and data structures. Furthermore we introduce the purged form OTDD which unifies the abbreviated and full form OTDD into a single diagram. A package exploiting these OTDD specific properties is presented and we show the computational advantages of this improved package for LGSynth93 standard benchmarks.

A-7B.2 Some Remarks about Spectral Transform Interpretation of MTBDDs and EVBDDs
Radomir S. Stankovic (Yugoslavia)

In this paper we give a spectral transform interpretation of AND-EXOR representations of switching functions and related decision diagrams in the vector space over GF(2). The consideration is uniformly extended to the Fourier series-like expressions of functions in the complex vector space and the decision diagrams for integer-valued functions. It is shown that the multi-terminal decision diagrams, MTBDDs, and edge-valued functions are derived by using the same sets of basic functions already applied for the decision diagrams attached to some AND-EXOR expressions, but considered over the complex field. The algebraic transform decision diagrams, ATDDs, are considered as the integer counterparts of the functional decision diagrams, FDDs, attached to the algebraic transform in the same way as the FDDs are attached to the Reed-Muller expressions. It is shown that the EVBDDs are the ATDDs in different notation.

A-7B.3 Manipulation of Regular Expressions Under Length Constraints Using Zero-Suppressed-BDDs
Shinya Ishihara, Shin-ichi Minato (NTT, Japan)

We present a new technique that broadens the scope of BDD application. It is a method for manipulating regular expressions that represent sets of sequences including repetitions of symbols. In general, sequences in the set represented by a regular expression have an infinite length and this makes representing and manipulating them difficult. In this paper, we introduce length constraints into a representation of regular expressions. Under these constraints, our method can represent and manipulate large-scale sets of sequences of regular expressions compactly and uniquely and greatly accelerates operations of regular expressions. As regular expressions can represent behaviors of a finite state machine, our technique provides a useful analysis method of finite state machines and can be applied to formal hardware verification techniques.

Session [A-7C] PANEL: How Sub-Micron Delay and Timing Issues Will Be Solved?

Chair: Hitoshi Yoshizawa (NEC, Japan)
Panelists: Ahsan Bootehsaz, Dennis B. Brophy, Donald R. Cottrell, Antun Domic, Vassilos Gerousis, Tamotsu Hiwatashi

Session [A-8A] Placement

Chair: Wayne W.-M. Dai (Univ. of California at Santa Cruz, USA)
Co-Chair: Masato Edahiro (NEC, Japan)

A-8A.1 Exploiting Signal Flow and Logic Dependency in Standard Cell Placement
Jason Cong, Dongmin Xu (Univ. of California, Los Angeles, USA)

Most existing placement algorithms consider only connectivity information during the placement process, and ignore other information available from the higher levels of design process. In this paper, we exploit the use of signal flow and logic dependency in standard cell placement by using the maximum fanout-free cone (MFFC) decomposition technique. We developed a containment tree based algorithm for splitting large MFFCs into smaller ones to get clusters with restricted sizes. We also developed a placement algorithm, named MFFC-TW, which first clusters the circuit based on MFFC decomposition and then feeds the clustered circuit to the Timberwolf6.0 placement package. Very promising experimental results were obtained.

A-8A.2 A New Performance Driven Placement Method with the Elmore Delay Model for Row Based VLSIs
Tetsushi Koide, Mitsuhiro Ono, Shin'ichi Wakabayashi, Yutaka Nishimaru (Hiroshima Univ., Japan), Noriyoshi Yoshida (Hiroshima City Univ., Japan)

In this paper, we present a new performance driven placement method based on path delay constraint approach for large standard cell layout. The proposed method consists of three phases and uses the Elmore delay model to model interconnection delay precisely in each phase. In the first phase, initial placement is performed by an efficient performance driven mincut partitioning method. Next, an iterative improvement method by nonlinear programming improves the layout. The improvement is formulated as the problem minimizing the total wire length subject to critical path delays. Finally, row assignment considering timing constraint is performed. From the experimental results comparing with RITUAL[17], the proposed method is much better than RITUAL in point of the maximal violation ratio, the total wire length, and the cut size, and is more effective in point of the interconnection delay model and its extendability.

A-8A.3 A Neural Network Approach to the Placement Problem
Morteza Saheb Zamani, Graham R. Hellestrand (Univ. of New South Wales, Australia)

In this paper, we introduce a new neural network approach to the placement of gate array designs. The network used is a Kohonen self-organizing map. An abstract specification of the design is converted to a set of appropriate input vectors fed to the network at random. At the end of the process, the map shows a 2-dimensional plane of the design in which the modules with higher connectivity are placed adjacent to each other, hence minimizing total connection length in the design. The approach can consider external connections and is able to place modules in a rectilinear boundary. These features makes the approach capable of being used in hierarchical floorplanning algorithms.

A-8A.4 Fanout-Tree Restructuring Algorithm for Post-Placement Timing Optimization
T. Aoki, M. Murakata, T. Mitsuhashi, N. Goto (Toshiba Corp., Japan)

This paper proposes a fanout-tree restructuring algorithm for post-placement timing optimization to meet timing constraints. The proposed algorithm restructures a fanout-tree by finding a tree in a graph which represents a multi-terminal net, and inserts buffer cells and resizes cells based on an accurate interconnection RC delay without degrading routability. The algorithm has been implemented and applied to a number of lay-out data generated by timing driven placement. Application results show a 17% reduction in circuit delay on the average.

Session [A-8B] Application Specific Design

Chair: Hiroaki Kunieda (Tokyo Inst. of Technology, Japan)
Co-Chair: Tatsuya Fujii (NTT, Japan)

A-8B.1 Synthesis and Simulation of Digital Demodulator for Infrared Data Communication
Hiroshi Uno, Toru Chiba (SHARP Corp., Japan), Keiji Kumatani, Isao Shirakawa (Osaka Univ., Japan)

A high performance design methodology is described for a digital demodulator, which is intended for the noise immune wireless infrared data communication. In this methodology, ASK(Amplitude Shift Keying) infrared signals detected by a photo detector are digitized into two logic-level pulses by an infrared receiver, and the demodulation of the digitized signals is implemented by a new architecture. On account of the interference with optical noises from fluorescent lamps, an ASK receiver is realized by a 1-bit digital demodulator, which is designed with use of a high level synthesis tool COMPASS so as to implement an algorithm for removing the noises. A part of experimental results is also shown to demonstrate the practicability.

A-8B.2 A Design of High-Performance Multiplier for Digital Video Transmission
Keisuke Okada, Shun Morikawa, Isao Shirakawa (Osaka Univ., Japan), Sumitaka Takeuchi (Mitsubishi Electric Corp., Japan)

A high performance design methodology is described for a multiplier to be used dedicatedly for digital video transmission. The key factor for such a multiplier is to operate at the speed of 30-100MHz but with the precision of 8-10 bits, since it is intended for FIR filtering of digital video data. In terms of implementing an FIR filter with more than ten taps, the same number of multipliers are required to be integrated. Moreover, for the preloadability of coefficients to the filter, each coefficient can be treated as a constant during the filtering operation. Motivated by these requirements and functionalities, a novel multiplier architecture is described, which is to be synthesized with the use of a high level synthesis tool PARTHENON in conjunction with manually designed macroblocks. Design results of the multiplier are also shown.

A-8B.3 Design Automation for Integrated Continuous-Time Filters Using Integrators
Kazuyuki Wada, Shigetaka Takagi, Zdzislaw Czarnul, Nobuo Fujii (Tokyo Inst. of Technology, Japan, Toshiba Corporation)

This paper proposes a design automation for filters using integrators. This is based on a predistortion without knowledge of a filter topology. The predistortion requires an integrator having the same structure, the same-value elements and an electrically controllable unity-gain frequency, and compensates for the deviation of frequency characteristics due to an excess phase shift of an integrator. The effectiveness of the proposed method is demonstrated through SPICE simulations. An algorithm for a filter design automation is also discussed.

A-8B.4S A Hardware-Oriented Design for Weighted Median Filters
Chun-Te Chen, Liang-Gee Chen, Jue-Hsuan Hsiao (National Taiwan Univ., Taiwan)

In this paper, the design consideration and algorithm mapping for weighted median filters are presented. To achieve high throughput rate, a special coding technique and its dedicated architecture with block processing are constructed to handle multiple filtering inputs and multiple filtering outputs concurrently. The pipelined cycle in our design is merely as the delay time of 1-bit carry-save-adder (CSA). Due to this design strategy, the proposed architecture can support not only weighted median filters but also rank order-based filters in high-speed applications.

A-8B.5S Techniques for Low Power Realization of FIR Filters
Mahesh Mehendale (Texas Instruments India Ltd., India), S. D. Sherlekar, G. Venkatesh (Silicon Automation Systems(India) Ltd.)

In this paper we propose techniques for low power realization of FIR filters on programmable DSPs. We first analyze the FIR implementation to arrive at useful measures to reduce power and present techniques that exploit these measures. We then identify limitations of the existing DSP architectures in implementing these techniques and propose simple architectural extensions to overcome these limitations. Finally we present experimental results on real FIR filter examples that show up to 88% reduction in coefficient memory data bus power, up to 49% reduction in coefficient memory address bus power.

Session [A-8C] Invited Tutorial: EDIF Version 350/400 and Information Modelling

Chair: Mike Church (Zuken-Redac, UK)
Speaker: Hilary J. Kahn

Session [A-9A] Delay Abstraction/Design Verification

Chair: Kewal K. Saluja (Univ. of Wisconsin, USA)
Co-Chair: Kazuhiro Iwasaki (Chiba Univ., Japan)

A-9A.1 Delay Abstraction in Combinational Logic Circuits
Noriya Kobayashi (NEC Corp., Japan), Sharad Malik (Princeton Univ., USA)

In this paper we propose a data structure for abstracting the delay information of a combinatorial circuit. The particular abstraction that we are interested in is one that preserves the delays between all pairs of inputs and outputs in the circuit. The proposed graphical data structure is of size proportional to ( m + n ) in best case, where m and n refer to the number of inputs and outputs of the circuit. In comparison, a delay matrix that stores the maximum delay between each input/output pair has size proportional to m x n. We present heuristic algorithms for deriving these concise delay networks. Experimental results shows that, in practice, we can obtain concise delay network with the number of edges being a small multiple of ( m + n ).

A-9A.2 Limits of Using Signatures for Permutation Independent Boolean Comparison
Janett Mohnke, Paul Molitor (Martin-Luther Univ. Halle, Germany), Sharad Malik (Princeton Univ., USA)

This paper addresses problems that arise while checking the equivalence of two Boolean functions under arbitrary input permutations. The permutation problem has several applications in the synthesis and verification of combinational logic: It arises in the technology mapping stage of logic synthesis and in logic verification. A popular method to solve it is to compute a signature for each variable that helps to establish a correspondence between the variables. Several researchers have suggested a wide range of signatures that have been used for this purpose. However, for each choice of signature, there remain variables that cannot be uniquely identified. Our research has shown that, for a given example, this set of problematic variables tends to be the same regardless of the choice of signatures. The paper investigates this problem.

A-9A.3 A Tool for Measuring Quality of Test Pattern for LSIs' Functional Design
Takashi Aoki, Tomoji Toriyama, Kenji Ishikawa, Kennosuke Fukami (NTT, Japan)

A prototype tool is developed for measuring the quality of test patterns for simulation to verify LSI functional designs. The tool is able to count activated conditional branches and evaluate the branch pass index of test patterns. The branch pass index indicates the ratio of the number of conditional branches validated by the pattern to the total number of conditional branches in a design. We developed the prototype tool for PARTHENON [1]. The tool prints out branch identification names not examined by the test pattern. In using the tool for experimental designs, it helped designers to significantly improve pattern quality if a branch pass index of 100% for LSI verification patterns was not achieved. Only about 30 seconds of the processing time was required for a 1000 sentence module. Bugs can often be found in designs with little effort.

Session [A-9B] High Level Synthesis (2)

Chair: Youn-Long Steve Lin (Tsing Hua Univ., Taiwan)
Co-Chair: Vasily Moshnyaga (Kyoto Univ., Japan)

A-9B.1 Search Space Reduction in High Level Synthesis by Use of an Initial Circuit
Atsushi Masuda, Hiroshi Imai, Jeffery P. Hansen, Masatoshi Sekine(Toshiba Corp., Japan)

Most existing high-level synthesis(HLS) systems attempt to generate a circuit from a behavioral description `out of the void', using the entire design space as the search domain. Because of the vastness of the search space, it is impossible to do more than a coarse grain search, often resulting in inefficient designs. This approach, ignores the designer's knowledge of the general structure of the circuit to be synthesized. In this paper, we describe the HLS system SIDER(Synthesis by Initial Design Extension and Refinement). SIDER utilizes designer knowledge about the design space in the form of an initial circuit. By limiting search to the neighborhood of this initial circuit, much finer grain search can be performed yielding a higher quality design. The effectiveness of the SIDER approach is shown by HLS of a 300 line C description of 27 instructions from a MC6502 CPU.

A-9B.2 A Datapath Synthesis System for the Reconfigurable Datapath Architecture
Reiner W. Hartenstein, Rainer Kress (Univ. of Kaiserslautern, Germany)

A datapath synthesis system (DPSS) for the reconfigurable datapath architecture (rDPA) is presented. The DPSS allows automatic mapping of high level descriptions onto the rDPA without manual interaction. The required algorithms of this synthesis system are described in detail. Optimization techniques like loop folding or loop unrolling are sketched. The rDPA is scalable to arbitrarily large arrays and reconfigurable to be adaptable to the computational problem. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The rDPA can be used as a reconfigurable ALU for bus oriented systems as well as for rapid prototyping of high speed datapaths.

A-9B.3 Synthesis-for-Testability Using Transformations
Miodrag Potkonjak, Sujit Dey, Rabindra K. Roy (NEC USA, USA)

We address the problem of transforming a behavioral specification so that synthesis of a testable implementation from the new specification requires significantly less area and partial scan cost than synthesis from the original specification. A two-stage objective function, that estimates the area and testability of the final implementation, and also captures enabling effects of the transformations, is developed. Optimization is done using a new randomized branch and bound steepest descent algorithm. Application of the transformation algorithm on several examples demonstrates significant simultaneous improvement in both area and testability of the final implementations.

Session [A-9C] PANEL: Electronic Data Book: Current Status of Standard Representation and Future Perspective

Organizer: Kinya Tabuchi (Mitsubishi Electric Corp., Japan)
Chair: Kinya Tabuchi (Mitsubishi Electric Corp., Japan)
Panelists: Andy Graham (CFI, USA), Bob Yencha (Pinnacles Group, National Semiconductor, USA), Tom Jeffery (Pinnacles Group, Hitachi Micro Systems, USA), Toshitaka Fukusima (ELISNET, Fujitsu, Japan), Joe Prang (Aspect Development Inc., USA)