GLSVLSI 1999 ABSTRACTS

Sessions: [Plenary] [2A] [2B] [3A] [3B] [4A] [4B] [5A] [5B] [6A] [6B] [6C] [7A] [7B] [8A] [8B] [8C] [9A] [9B]


Plenary Session: Plenary Session- Invited Papers

High Performance Options through Nanoelectronics
G. Pomerenke

MEMs
KD. Wise


Session 2A: Testing

PASTA: Partial Scan to Enhance Test Compaction [p. 4]
Irith Pomeranz and Sudhakar M. Reddy

We propose a procedure to select flip-flops for partial scan targeting the reduction of test length. We show that significant reductions in test length can be achieved by this procedure. In addition, experimental results show that using heuristics that target the test length does not have to increase the numbers of flip-flops that need to be scanned in order to achieve a given level of fault coverage. Consequently, it may be possible to perform partial scan selection targeting the two parameters, test length and fault coverage, without requiring more flip-flops than required for one of the parameters.

On Applying Set Covering Models to Test Set Compaction [p. 8]
Paulo F. Flores, Hor´cio C. Neto and João P. Marques-Silva

Test set compaction is a fundamental problem in digital system testing. In recent years, many competitive solutions have been proposed, most of which based on heuristics approaches. This paper studies the application of set covering models to the compaction of test sets, which can be used with any heuristic test set compaction procedure. For this purpose, recent and highly effective set covering algorithms are used. Experimental evidence suggests that the size of computed test sets can often be reduced by using set covering models and algorithms. Moreover a noteworthy empirical conclusion is that it may be preferable not to use fault simulation when the final objective is test set compaction.

On Test Generation with a Limited Number of Tests [p. 12]
Hideyuki Ichihara, Seiji Kajihara, Kozo Kinoshita

This paper considers a new test generation scheme in which a limitation of the number of tests exists. Since, in this scheme, correct fault coverage cannot be calculated by the representative faults, we present a method for calculating the correct fault coverage by using the weighted fault list. And then we propose a selection-based test generation method which derives limited number of tests with higher fault coverage. The experimental results for IDDQ testing shows that our test generation method can generate tests with fault coverage close to the maximum fault coverage.

Functional ATPG for Delay Faults [p. 16]
S. Tragoudas, M. Michael

This paper presents a functional level ATPG tool for delay faults which handles all existing fault models. The tool generates patterns using either binary decision diagrams or boolean satisfiability. Experimental results are presented on the ISCAS'85 benchmarks.

On Path Delay Fault Testing of Multiplexer-Based Shifters [p. 20]
H. T. Vergos, Y. Tsiatouhas, Th. Haniotakis, D. Nikolos, and M. Nicolaidis

In this paper we present a method for path delay fault testing of multiplexer-based shifters. We show that many paths of the shifter are non-robustly testable and we give a path selection method so as all the selected paths to be robustly testable by 20 * log2n + 2 test-vector pairs. where n is the length of the shifter. The propagation delay along all other paths is a function of the delays along the selected paths.

A Test Vector Ordering Technique for Switching Activity Reduction during Test Operation [p. 24]
P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch

This paper considers the problem of testing VLSI integrated circuits without exceeding their power ratings during test. The proposed approach is based on the reordering of test vectors of a given test sequence to minimize the average and peak power dissipation during test operation. For this purpose, the proposed technique reduces the internal switching activity by lowering the transition density at circuit inputs. The technique considers combinational or full scan sequential circuits and do not modify the initial fault coverage. Results of experiments show reductions of the switching activity ranging from 11 % to 66 % during external test application.


Session 2B: VLSI Design 1

VLSI Implementation of Early Branch Prediction Circuits for High Performance Computing [p. 30]
Aamir A. Farooqui, Vojin G. Oklobdzija

In this paper, design and VLSI implementation of an Early Branch Prediction (EBP) circuit, based on a variation of Carry Look-ahead scheme is presented. The key features of this design are low area, high speed (2[log n/2] + 1), and high modularity. This design out performs all the EBP designs presented so far. For 64-bit word length the early branch prediction is obtained in 679 ps as simulated for 0.2-μm technology under typical conditions. Simulation and layout results for 0.2-μm CMOS technology show a 30% increase in speed with 25% decrease in area as compared, to recently published results.

The Design of a Register Renaming Unit [p. 34]
Benjamin Bishop, Thomas P. Kelliher, Mary Jane Irwin

Register renaming is often used to improve performance in many high-ILP processors. However there is a lack of publications regarding register renaming hardware design. This paper presents a detailed look at one possible implementation of a register renaming unit, as well as some possible optimizations.

Efficient and Safe Asynchronous Wave-Pipeline Architectures for Datapath and Control Unit Applications [p. 38]
O. Hauck, M. Garg, and S.A. Huss

This paper presents a generalization of a previously proposed asynchronous wave-pipeline architecture. Four-phase and two-phase communication units supporting more than one wave in the logic are proposed. General feedback structures are then outlined. Simulations from a 16-bit add-and-shift ring demonstrate their feasibility. The same architecture is applicable for both datapath and control enabling the realization of complete high-throughput asynchronous systems.

Memory Organization of a Single-Chip Video Signal Processing System with Embedded DRAM [p. 42]
Jorg Hilgenstock, Klaus Herrmann, Peter Pirsch

A programmable single-chip multiprocessor system for video coding applications has been developed. It integrates four processing elements, on-chip DRAM, and application-specific interfaces. The integrated DRAM is primarily used as frame buffer and makes external memory for most applications obsolete. For fast access to local data segments also static RAM is integrated in each processing element

Theoretical Analysis of Word-Level Switching Activity in the Presence of Glitching and Correlation [p. 46]
Janardhan H. Satyanarayana and Keshab K. Parhi

This paper presents a novel analytical approach to compute the switching activity in digital circuits at the word-level in the presence of glitching and correlation. The proposed approach makes use of signal statistics such as mean, variance, and autocorrelation. A novel expression is derived for the switching activity a f at the output node f of an arbitrary circuit in terms of time-slot autocorrelation coefficient, the expected value, and the signal probability. The switching activity anal ysis of a signal at the word-level is computed by summing the activities of all the individual bits constituting the signal. A novel relationship between the correlation coefficient of the higher order bits of a normally distributed signal and the bit where the correlation begins is also presented. The proposed approach can estimate the switching activity in less than a second which is orders of magnitude faster than simulation based approaches. Simulation results show that the errors using the proposed approach are about 6% on an average and that the approach is well suited even for highly correlated speech and music signals.

Adaptive Hard Disk Power Management on Personal Computers [p. 50]
Yung-Hsiang Lu, Giovanni De Micheli

Dynamic power management can be effective for designing low-power systems. In many systems, requests are clustered into sessions. This paper proposes an adaptive algorithm that can predict session lengths and shut down components between sessions to save power. Compared to other approaches, simulations show that this algorithm can reduce power consumption in hard disks with less impact on performance or reliability


Session 3A: Delay Modeling

Inductance Effects in RLC Trees [p. 56]
Yehea I. Ismail, Eby G. Friedman, and Jose L. Neves

A closed form solution for characterizing voltage-based signals in an RLC tree is presented. This closed form solution is used to derive figures of merit to characterize the effects of inductance at a specific node in an RLC tree. The effective damping factor of the signal at a specific node in an R!,C tree is shown to be a useful figure of merit. As the effective damping factor of a signal increases, an RC model is sufficiently accurate to characterize that waveform. The rise time of the input signal driving an RLC tree is another factor characterizing the importance of inductance, As the rise time of the input signal becomes much larger than the effective LC time constant at a specific node within an RLC tree, the signal at this node does not exhibit the effects of inductance. Evidence is provided showing that using a single line analysis to determine the importance of including inductance to characterize a tree structured interconnect line is invalid in many cases and can lead to erroneous conclusions.

S2P: A Stable 2-Pole RC Delay and Coupling Noise Metric [p. 60]
Emrah Acar, Altan Odabasioglu, Mustafa Celik, and Lawrence T.Pileggi

The Elmore delay is the metric of choice for performance-driven design applications due to its simple, explicit. form and ease with which sensitivity information can he calculated. However; for deep submicron technologies, the accuracy of the Elmore delay is insufficient. In this paper; we formulate a delay model using a provably stable two pole waveform response that provides a unique mapping between four moments and a specific delay value. Unlike traditional moment matching, this two-pole model permits us to precharacterize the delays, and store them in a table, as a mapped function of three parameters. The model also provides an explicit expression for the peak noise induced on a coupled line as a function of the same three moments. The results indicate runtimes comparable to an Elmore delay calculation but with the accuracy of an AWE approximation.

ICE: Incremental 3-Dimensional Capacitance and Resistance Extraction for an Iterative Design Environment [p. 64]
Yanhong Yuan, Prithviraj Banerjee

In this paper, we discuss the 3-Dimensional(3-D) capacitance and resistance extraction within an iterative design environment, where small changes are made to the 3-D structures. We present a bounded incremental algorithm for accurate and fast 3-D extraction in such a design environment, based on the Boundary Element Method(BEM). The incremental algorithm can re-utilize the computation results of previous extractions and rapidly re-compute the new parasitic parameters in response to the design changes made to the layout. The incremental algorithm has been implemented in the ICE tool. Experimental results on a set of 3-D interconnect structures show that the incremental algorithm is efficient, for the iterative design methodology. For one large structure, the incremental extraction is over 20 times faster than the full extraction without using the incremental algorithm. To the best of our knowledge, this is the first reported work on an incremental algorithm for capacitance and resistance extraction.

An Exact Analytical Time-Domain Model of Distributed RC Interconnects for High Speed Nonlinear Circuit Applications [p. 68]
Ninglong Lu and Ibrahim N. Hajj

Accurate simulation of interconnect effects is an increasingly critical step in high speed deep submicron design. With ever increasing frequency of digital/analog signals. the traditional lumped RC elements may not be accurate enough in modeling RC interconnects in VLSI applications due to the distributed nature of realistic interconnects. In this paper a novel analytic time-domain model for distributed RC interconnects is developed for application in nonlinear circuit simulators. The exact analytical solution is derived under the assumption of piecewise-linear signal waveforms at the two ports of the line. We have incorporated this model into a general-purpose circuit simulator using SWEC technique.


Session 3B: VLSI Design 2

A Radix-16 SRT Division Unit with Speculation of the Quotient Digits [p. 74]
Gianluca Cornetta, Jordi Cortadella

The speed of a divider based on a digit-recurrence algorithm depends mainly on the latency of the quotient digit generation function. in this paper we present an analytical approach that extends the theory developed for standard SRT division and permits to implement division schemes where a simpler function speculates the quotient digit. This leads to division units with shorter cycle time and variable latency since a speculation error may be produced and a post-correction of the quotient may be necessary. We have applied our algorithm to the design of a radix-16 speculative divider for double precision floating point numbers, that resulted to be faster than analogous implementations.

Area-Efficient Area Pad Design for High Pin-Count Chips [p. 78]
Louis Luh, John Choma,Jr., and Jeffrey Draper

This paper presents an area pad layout method to efficiently reduce the space required for interconnection pads and pad drivers. Unlike peripheral pads, area pads use only the top metal layer and therefore allow active circuitry to be laid out undernesth. With identical functional elements grouped together. a group of pad drivers share the same well and can be placed tightly together. The use of silicided diffusion reduces the well contact to diffusion contact spacing requirement, by taking advantage of this spacing requirement and rising serpentine gate layout, a driver's size can he effectively reduced without reducing the driving capacity.An embeddded multicomputer router interface chip has been implemented using these techniques and has achieved 554 pads in a 9mm x 6mm chip with a 0.8ml single-poly 3-metal N-well CMOS process.

New 2 Gbit/s CMOS I/O Pads [p. 82]
Guido Masera, Gianluca Piccinini, Massinio Ruo Roch and Maurizio Zamboni

A couple of low complexity high performance input and output pads are proposed: they have been designed in 0.7 µnn CMOS ES2 technology and support bit rates ranging from DC up to 2 Gbit/s. The differential input pad and the differential output pad interface true PECL external logic levels to full swing 5V CMOS internal levels.

A Methodology for Minimizing Power Dissipation of Embedded Systems through Hardware/Software Partitioning [p. 86]
Jorg Henkel

We present a novel approach that minimizes the power dissipation of embedded core-based systems through hardware/software partitioning. Our approach is based on the idea of mapping clusters of operations/instructions to a core that ,Melds a high utilization rate of the involved resources (ALUs, multipliers, shifters etc.) and thus minimizing power dissipation. Our approach is comprehensive since it takes into consideration the power dissipation of a whole embedded system comprising a microprocessor core, application specific (ASIC) core(s), cache cores and a memory core. We report high reductions of power dissipation between 35c7c and 94% at the cost of a relatively small additional hardware overhead of less than 16k cells while maintaining or even slightly increasing the performance compared to the initial design.


Session 4A: Analog and Digital Testing

On Optimizing Test Strategies for Analog Cells [p. 92]
Anna M. Brosa and Joan Figueras

The purpose of this paper is to analyze an optimization method to improve the testability of structural defects, such as bridges and opens, in low-power low-voltage analog circuits. The approach consists of finding an optimum subset of tests which maximizes the fault coverage -with minimum cost. An application example is given to illustrate the proposal by studying the fault coverage obtained using different test sets on a simple 2-stage Nested Transconduct a rice Capacitance Compensated (NGCC) amplifier.

Novel Design for Testability of a Mixed-Signal VLSIC [p. 97]
E. McShane, K Shenai, L. Alkalai, E. Kolawa, V. Boyadzhyan, B. Blaes, and W.C. Fang

A novel testability architecture has been developed for a mixed-signal VLSIC which has a functional architecture consisting of a microprocessor core, RF transceiver, nd two voltage regulators. It permits a decoupling of analog/RE, digital, and power systems for individual stimulation and analysis. Testing may be performed at the subsystem or block level, and traditional scan techniques are augmented to allow mixed static and dynamic test. This approach aids in identifying any detrimental interaction between individual subsystems by providing isolation between the circuit-under-test and idle circuits.

The Development of Analog SPICE Behavioral Model Based on IBIS Model [p. 101]
Ying Wang, Han Ngee Tan

This paper presents an approach for building an analog SPICE behavioral model based on the information provided by IBIS model. Such analog SPICE behavioral model can describe both static and dynamic characteristics of I/O buffers. The method to extract dynamic information from IBIS switching waveform VT tables is discussed in detail. Two types of models can be generated depending on the availability of the waveform tables with different load conditions in IBIS data. The influence of waveform table load condition on the validity of the analog SPICE behavioral model is also investigated.

Fault Coverage Estimation for Early Stage of VLSI Design [p. 105]
Von-Kyoung Kim, Tom Chen, Mick Tegethoff

This paper proposes a new fault coverage estimation model which can be used in the early stage of VLSI design. The fault coverage model is an exponentially decaying function with three parameters, which include the fault coverage upper bound, UB, the fault coverage lower bound, LB, arid the rate of fault coverage change, α. The fault coverages using three different testing scenarios, which are no DFT, scan, iddq testing, are predicted using circuit design information, sue/i as gate count, JO count, and FF count. These parameters are often readily available at the early stage of VLSI design. Finally, the composite fault coverage is estimated by combining different fault coverages. Experimental result showed a 1.9% model estimation error with a given circuit information in the early design.

Pseudo-Exhaustive Testing of Sequential Circuits [p. 109]
Bassam Shaer, Sami A. Al-Arian, David Landis

A new sequential circuit partitioning algorithm is introduced which enhances pseudo-exhaustive testing. Our PIFAN algorithm is based on an analysis of Primary Input cones and FANout values. Results are presented which show f/mat PIFAN offers significant reductions in hardware overhead and test tune when compared to an alternative partitioning algorithms.


Session 4B: Nanoelectronics 1

Self-Assembly Based Approaches for Metal/Molecule/Semiconductor Nanoelectronic Circuits [p. 114]
D.B. Janes, R.P. Andres, E.H. Chen, J. Dicke, V.R. Kolagunta, J. Lauterbach, T. Lee, J. Liu, M.R. Melloch, E.L. Peckham, T. Pletcher, R. Reifenberger, H.J. Ueng, B.L. Walsh, J.M. Woodall, C.P. Kubiak, and B. Kasibhatla

This paper describes a technological approach which combines the nanoscale elements available from molecular devices and self-assembled molecular/nanoparticle systems with semiconductor devices which can provide the gain or bistability required for computational functionality. The architectural motivation for these configurations and experimental demonstrations of several key technologies for this hybrid approach are described.

Logic in Wire: Using Quantum Dots to Implement a Microprocessor [p. 118]
Michael T. Niemier, Peter M. Kogge

Despite the seemingly endless upwards spiral of modern VLSI technology, many experts are predicting a hard wall for CMOS in about a decade. Given this, researchers continue to look at alternative technologies, one of which is based on quantum dots, called quantum cellular automata. While the first such devices have been fabricated, little is known about how to design complete systems of them. This paper summarizes one of the first such studies, namely an attempt to design a complete, albeit simple, CPU in the technology. The projections are striking: a projected 10 to 1 increase in circuit density when compared to a CMOS equivalent, but a design approach which is radically different from conventional "logic" design, especially in timing considerations.

Why is Time-Varying Control Necessary for Signal Processing with Locally- Connected Quantum-Dot Arrays? [p. 122]
Arp´d. I. Csurgay, Craig S. Lent, and Wolfgang Porod

(Exended Abstract)

Resonant Tunneling Technology for Mixed Signal and Digital Circuits in the 10-100 GHz Domain [p. 123]
T.P.E. Broekaert, B. Brar, F. Morris, A.C. Seabaugh, and G. Frazier

The inherent bistability and picosecond time-scale switching of the resonant tunneling diode (RTD) provides an ideal element for the design of digital circuits and analog signal quantizers in the 10-100 GHz domain. New differential RTD-based circuits for quantizers and a first-order Sigma-Delta modulator capable of operating at 10 GHz and beyond are introduced.


Session 5A: Synthesis

Efficient Algorithms for Finding Highly Acceptable Designs Based on Module-Utility Selections [p. 128]
Chantana Chantrapornchai, Edwin H.-M. Sha, Xiaobo (Sharon) Hu

In this paper, we present an iterative framework to solve module selection problem under resource, latency, and power constraints. The framework associates a utility measure with each module. This measurement reflects the usefulness of the module for a given a design goal. Using modules with high utility values will result in superior designs. We propose a heuristic which iteratively perturbs module utility values until they lead to good module selections. Our experiments show that the module selections formed by combinations of modules with high utility values are superior solutions. Further; by keeping modules with high utility values, the module exploration space can drastically be reduced.

Reducing BDD Size by Exploiting Structural Connectivity [p. 132]
Ronnie L. Wright, Michael A. Shanblatt

Computer-aided design tools have been limited by the use of the Binary Decision Diagram (BDD). The major drawback of the BDD is its abundant usage of CPU time and memory. Techniques such as BDD variable ordering and sharing have been used in the past to address the size issue. However; these techniques remain to be limited to modest-sized circuits. In this paper; we present a significant variation to the conventional BDD, the Connective Binary Decision Diagram (CBDD). The CBDD addresses the size issue concerning conventional BDD implementations by employing the use of minimized-scalable binary decision diagrams (MSBDDs) combined with the structural connectivity present in the circuit's netlist. The experimental results section will demonstrate that the proposed method reduces the BDD size by more than two orders of magnitude for large circuits.

An Integrated Approach for Synthesizing LUT Networks [p. 136]
Shigeru Yamashita, Hiroshi Sawada, Akira Nagoya

This paper presents a method for synthesizing look-up table (LUT) networks. The strategy employed by our method is very different from the strategies of previous methods; many decomposition methods that are not only algebraic but also functional are integrated very well. Our method can be thought of as a general framework for LUT network synthesis integrating various decomposition methods. The experimental results are very encouraging.

Hierarchical Scheduling in High Level Synthesis Using Resource Sharing Across Nested Loops [p. 140]
Abhijit Ghosh, Sandeep K. Lodha, Ranga Vemuri

This paper presents a resource-constrained scheduling algorithm for hierarchical behavioral specifications containing nested loops. The algorithm attempts to share resources across levels, to schedule operations that belong to different levels of the nested loop structures in the specifications as well as operations that belong to the same level. We compare the results of scheduling using our algorithm with those obtained using traditional list scheduling with no sharing of resources among different levels of the specification. These results show an average improvement of 23.47% in terms of number of control steps.

Design Issues in the Synthesis of Reusable Cores [p. 144]
Rohit Sharma and C. P. Ravikumar

While core-based design is itself a challenging task, it is equally challenging for a core vendor to provide information about a core without compromising on the protection of intellectual property. A number of issues are to be taken into consideration when designing a core. While conventional goals such as minimal area and maximal performance continue to hold, additional constraints such as core testability and power dissipation will have to be considered. Since the vendor of a core does not reveal details about the internals of the core, it is often the responsibility of the vendor to provide the test plan for the core. In this paper, we present our experiences in designing a testable CORDIC core.
Keywords: Embedded Cores, Deign Reuse, CORDIC Arithmetic, and Core Testability.


Session 5B: Nanoelectronics 2

Ultrahigh-Speed Circuits Using Resonant Tunneling Devices [p. 150]
M. Yamamoto, H. Matsuzaki, T. Itoh, T. Waho, T. Akeyoshi, and J. Osaka

Ultrahigh-speed circuit applications of resonant tunneling diodes (RTDs) have been developed. One of the key concepts is the merged utilization of RTDs and high electron mobility transistors (HEMTs). The integration technology for lnP-based RTDs and HEMTs has been developed. Another key technology developed is a circuit configuration using series-connected RTDs, driven by a clocked bias, in combination with HEMTs. Given this circuit concept, various kinds of edge-triggered flip-flop circuits and multiple-valued quantizers featuring high-speed operation and compact configuration have been constructed. By extending this circuit concept, an optoelectronic circuit using RTDs and a photodiode has also been developed. High-speed operations have been demonstrated, including a delayed flip-flop circuit operating at 35 Gbit/s, multiple-valued quantizers operating at 10 GHz, a 2-bit analog-to-digital converter operating at 5 GHz and an optoelectronic circuit that demultiplexes an 80 Gbit/s optical signal into a 40 Gbit/s electrical signal. The presented results clearly show the potentiality of RTD-based circuits for the construction of unprecedented ultra high-speed communications and signal processing circuits.

A Novel High-Speed Flip-Flop Circuit Using RTDs and HEMTs [p. 154]
Hideaki Matsuzaki, Toshihiro Itoh, and Masafumi Yamamoto

An RTD (resonant tunneling diode)-based flip-flop circuit with a new configuration is proposed. The circuit features an SCFL interface for both input and output, and achieves high-speed operation with a simplified configuration. The circuit consists of only two RTDs and three HEMTs, and works as a delayed flip-flop (D-FF) with return-to-zero (RZ) mode output. 50 Gbit/s operation is confirmed by SPICE simulation for the SCFL-interfaced D-FF with the proposed configuration. A static binary frequency divider (T-FF) is also designed based on the same concept. It is fabricated by InP-based RTD/HEMT integration technology, and its proper operation of up to 15 GHz is confirmed experimentally.

Design and Analysis of a Novel Quantum-MOS Sense Amplifier Circuit [p. 158]
Tetsuya Uemura, Pinaki Mazumder

A novel quantum-MOS sense amplifier circuit consisting of resonant tunneling diodes (RTD 's) as pull-up devices and NMOS transistors is discussed in this paper. Compared to the conventional sense amplifier circuits using CMOS technology, the proposed QMOS sense amplifier exhibits about 20% higher sensing speed. The cross-coupled QMOS latch, which is at the heart of the sense amplifier circuit, has metastable and unstable states which are closely related to the I-V characteristics of the RTD 's. The stability analysis has been made by using phase-plot diagram and how RTD parameters relate to circuit speed and robustness of the sense amplifier has been discussed.

Integration of InAs/AlSb/GaSb Resonant Interband Tunneling Diodes with Heterostructure Field-Effect Transistors for Ultra-High Speed Digital Circuit Applications [p. 162]
P. Fay, G.H. Bernstein, D. Chow, J. Schulman, P. Mazumder, W. Williamson, and B. Gilbert

Resonant tunnelling diode based Logic circuits offer significant advantages for low power, ultra-high-speed applications. In this work, a Low-power resonant interband tunneling diode (RITD)-based logic technology capable of operating at clock rates of at least 12 GHz is reported. The circuits are fabricated using InAs/AlSb/ GaSb RITDs. Fanout of at least two at a clock rate of 10 GHz is also reported for two AND gates in a two-stage pipelined configuration. Simulation results for an RITD/ HFET circuit based on measured characteristics of InAs/AlSb/GaSb RITDs and InAs-channel HFETs for a simple inverting Schmitt trigger are presented to demonstrate the advantages of an integrated RITD/HFET technology. This circuit architecture demonstrates proper operation with power supply voltages as Low as 0.5 V. In addition, well defined logic levels and abrupt logic transitions are achieved, despite the limited transconductance and Large output conductance typical of InAs-channel HFETs.
Keywords: resonant tunneling diode (RTD), resonant interband tunneling diode (RITD), heterostructure field-effect transistor (HFET), ultra-high-speed logic circuits

A Memory Design in QCAs Using the SQUARES Formalism [p. 166]
D. Berzon and T.J. Fountain

We present a formalism for implementing circuits with Quantum-dot Cellular Automata (QCA), comprising a set of standard circuit elements with uniform layout rules. The formalism simplifies circuit design from an engineering perspective and overcomes an observed sensitivity of QCA systems to input delays. A design for an addressable shift register is implemented, and promises considerable density gains over conventional CMOS.


Session 6A: Design issues

Transistor Level Synthesis for Static CMOS Combinational Circuits [p. 172]
Chia-Pin R. Liu, Jacob A. Abraham

This paper introduces a novel framework to synthesize static CMOS circuits at the transistor level. A new class of binary decision diagrams (BDDs) which represent inverting Boolean functions, called Transistor Mapped BDDs (TMBDDs), is used in the synthesis process. There is a one-to-one correspondence between a transistor netlist and its TMBDI), Nodes in a TM-BDD represent gate inputs and the edges represent the transistors in the netlist. TM-BDDs can be optimized using BDD operations, and the data structure can retain device aspect ratios and geometries for performance optimization. The synthesis process involves a transformation from logic functions to transistor netlists using TM-BDDs. We show how a transistor netlist can be automatically generated during a depth-first traversal on a TM-BDD. The synthesis process is not only independent of any library, but also capable of generating a cell library for a particular circuit. Experimental results demonstrating the reduction of transistor counts are presented.

SINMEF-A Decomposition Based Synthesis Tool for Large FSMs [p. 176]
Carlos Humberto, Llanos Quintero and Marius Strum

This paper describes the SINMEF environment, composed of the DECMEF and the SIS [9] systems, used to synthesize large finite state machines (FSMs). The DECMEF system consists of a set of tools to decompose a FSM into a set of cooperating sub-FSMs. An efficient cost fraction is used to guide the decomposition process. The decomposed FSMs are state encoded and further optimized amid technology mapped using tools from the S/S system. Results obtained for FSMs with more than 1000 states showed an improvement of as much as 60.42% in critical path and 14.79% in area. Preliminary results show that the recursive use of' the decomposition system extends its application to FSMs witlt several thousands of states.
Keywords: FSM, decomposition, non- deterministic transitions, redundant transitions, clustering technique.

An Approach for Testing Safety-Critical Software [p. 180]
Weiwei Li, Zhongwei Xu, Yan Jin

A novel approach for testing the effectiveness, efficiency, safety and relative appropriateness of Computer Interlocking Software (CIS) --a kind of safety- critical software is presented wit/i a software platform developed to support this approach. A brief description of the proposed approach is also included. Key Words Safety-Critical Software, Failure Severity Level, Failure Frequency, Software Safety Integrity Level, Software Validation

Design Recovery for Incomplete Combinational Logic [p. 184]
Travis E. Doom, Anthony S. Wojcik, Moon-Jung Chung

Motivated by the problem of reengineering legacy digital circuits for which design information is missing or incomplete, this paper presents a new technique for representing the relationships among the internal components of a combinational circuit. This technique proves to he a powerful tool for redesign, capable of representing internal Boolean relationships in a fully or partially specified multiple-output combinational circuit with a single data structure

Regression-Based Macromodeling for Delay Estimation of Behavioral Components [p. 188]
A. Macii, E. Macii, G. Odasso, M. Poncino, and R. Scarsi

This paper presents a methodology for delay estimation of hardware components described at the behavioral-level. The basis of the proposed technique is a well-known theoretical result that relates the entropy of a logic function to the delay of a multi-level implementation of the same function. We propose an improved model for delay estimation, and we prove its validity by means of experiments performed on a set of standard benchmarks.

Efficiently Searching the Optimal Design Space [p. 192]
Stephen A. Blythe and Robert A. Walker

One of the primary advantages of a high-level synthesis system is its ability to explore the design space. This paper presents several methodologies for design space exploration that compute all optimal tradeoff points for the combined problem of scheduling, clock length determination, and module selection. We discuss how each methodology takes advantage of both the structure within the design space itself as well as the structure of, and interaction between, each of the three subproblems.


Session 6B: VLSI Circuits 1

A Bandpass Sigma-Delta for Software Low-Power and Low-Voltage Radio by Using PATH Technique [p. 198]
Yiu (Simon) Wu, John Ling and Ward J. Helms

Tins paper proposes a PArallel Two patH (PATH) technique for oversampled bandpass analog-to-digital converter in low-power and low-voltage environment to relax the settling requirement and to increase signal-to-noise ratio, Time design considerations for the implementation are evaluated and strategies overcome the possible problems. It is clocked at 20MHz and digitized a 200KHz bandwidth signal centered at 10MHz with 87dB Signal-to-Noise Ratio (SNR) while suppressing the undesired mirror image signal at 40dB in 1.8-V supply voltage.

No-Race Charge-Recycling Differential Logic (NCDL) [p. 202]
Seung-Moon Yoo and Sung-Mo (Steve) Kang

This paper describes No-race Charge-recycling Differential Logic (NCDL) which realizes low power computation with less sensitivity to input signal skews. Performance comparison with previous charge recycling logics is shown for a 2-input NAND logic. NCDL operates in push-pull mode and achieves about 35% improvement in power-delay product over full swing differential logic without the pre-evaluation problems. Thus, it shows increased effectiveness for the implementation of random logic with input signals arriving in an arbitrary sequence.

Linear Transconductors Using Low Voltage Low Power Square-Law CMOS Cells [p. 206]
Tuna B. Tarim and Mohammed Ismail

Two transconductors composed of two square-law CMOS cells are introduced in this paper. The analysis of the cells is given. The transconductors operate in the saturation region with a fully balanced input signal. Simulations were done for 0.8μm n-well process using BSIM3 model parameters. The first circuit has a trade-off between low voltage operation and low power dissipation. The circuit has a cutoff frequency of 170MHz and Pdis=l.l7mW for a bias current of l20μA. The second transconductor has aimed to overcome the trade-off and to improve the performance; the circuit has a cutoff frequency of 236MHz and Pdis=l.74mW for the same bias current, however, it is possible to reduce the bias current, since the trade-off T he transconductors have a THD of less then -56dB and -60dB, respectively, for 1MHz, 0.5V peak-to-peak sinusoidal input. A comparison between the two circuit performances is given.

Current Sensor on the Base of Permanent Pre-Chargeable Amplifier [p. 210]
Victor Varshavsky, Masayuki Tsukisaka

The sensitivity and delay of the amplifier the key problems in the performance of Current Sensors(CS). For large devices which consist of several cells, for example 32bit, the amplifier must react to 1O-2OmV The previous type of highly sensitive amplifier which is based on cascade and reference voltage[dill] can react to this level of voltage. But this model is not stable in respect to technological and parametric variation. In this paper, we suggest a tripple cascade inverters feedbacked by a un-symmetrical pass transistor which amplifies lmV without reference voltage. Monte-Cairo SPICE simulation shows the stableness of this model for parametric variation. We prepare the schematic of CS which includes control unit with shunt transistors and evaluate the delay.

Parallel Saturating Fractional Arithmetic Units [p. 214]
Navindra Yadav, Michael Schulte, John Glossner

This paper describes the designs of a saturating adder, multiplier, single MAC unit, and dual MAC unit with one cycle latencies. The dual MAC unit can perform two saturating MAC operations in parallel and accumulate the results with saturation. Specialized saturation logic ensures that the output of the dual MAC unit is identical to the result of the operations performed serially with saturation after each multiplication and each addition

Residue Arithmetic Circuits Based on Signed-Digit Number Representation and the VHDL Implementation [p. 218]
Shugang Wei, Kensuke Shimizu

Residue arithmetic circuits based on radix-s signed-digit (SD) number representation, using integers 2p and 2p ± 1 as moduli of residue number system(RNS), are presented. The modulo m addition, m = 2 p or m = 2p ± 1, is performed by a carry-free SD adder and the modulo in multiplier is constructed using a binary modulo m SD adder tree. The implementation for the residue arithmetic circuits with VHDL description is proposed. The modulo m adders and multipliers have about 530 and 5000 gates, respectively, in cases of m =216±1.


Session 6C: Short Papers 1

Model Evaluation Using Genetic Manipulation Techniques [p. 224]
Z. Stamenkovic, H.-Ch. Dahmen, and U. Glaeser

Formal Verification is an important area in industry with getting more and more attention. Growing complexity of digital circuits and the use in safety critical systems are the reasons for the need of tools for checking the correctness of designs. In this paper we present a new approach for model evaluation. With our approach we are able to increase the belief of a designer in the right functionality of a circuit without the long runtimes of classical model checking but with more reliability than testing a design via simulation with some input patterns. To achieve this goal we use our genetic manipulation technique: a combination of classical genetic algorithms with a goal oriented mutation operator

A Genetic Algorithm for Register Allocation [p. 226]
K.M. Elleithy and E.G. Abd-El-Fattah

In this paper we introduce a new genetic algorithm for register allocation. A merge operator is used to generate new individual solutions. The number of steps required to examine all pairs in the population matrix to generate n2 (n is the population matrix size). Generating an offspring from the parents needs m steps (m number of nodes). The total number of steps required by the algorithm is n2m, that is, the genetic algorithm has a linear time complexity in terms of number of nodes. The experimental results show optimal solutions in many of the graphs used for testing.

Congestion Mitigation during Placement [p. 228]
Kanad Chakraborty and Natesan Venkateswaran

High post-placement congestion in complex ASICs and microprocessors may pose severe constraints on the wiring resources, thereby causing wireability, timing and noise problems. Linear wire length-based mincut partitioning algorithms have some built-in advantages for reducing congestion. We present a mathematical model of congestion and experimentally investigate various congestion mitigation techniques used in conjunction with linear wirelength-based placement. The experimental results validate our congestion model. Our placement tool, CPlace©, is a clustering-based mincut partitioner that optimizes a linear wirelength objective.

A Spiffy Tool for the Simultaneous Placement and Global Routing for Three-Dimensional Field-Programmable Gate Arrays [p. 230]
John Karro and James P. Cohoon

FPGAs are a useful and flexible alternative to custom design chips, but can suffer from severe interconnection delay. The 3D-FPGA is an alternative to the two-dimensional architecture that has been proposed to reduce these delay problems [2]. Here we present Spiffy - the first tool specifically designed for the placement and global routing of 3D-FPGAs. Spiffy produces some of the best results in the literature, and using Spiffy, we can show that when mapped to the 3D-FPGA architecture, circuits tend to have considerably shorter net-length, making this new chip an improvement over the standard architecture.

Formal Verification of Tree-Structured Carry-Lookahead Adders [p. 232]
Sae Hwan Kim, Shiu-Kai Chin

Quad trees - trees with four branches, are used to abstractly describe tree-structured carry-lookahead adders using 4-bit components. The specification and implementation descriptions are parameterized and describe tree-structured adders having arbitrarily large inputs and outputs. The descriptions are formally verified using the HOL theorem prover.

Bounding Algorithms for Design Space Exploration [p. 234]
Samit Chaudhuri, Robert A. Walker

This paper describes several new algorithms for computing lower bounds on the length of the schedule and the number of functional units in high-level synthesis.

Digital Neural Processing Unit for Electronic Nose [p. 236]
Hoda S. Abdel-Aty-Zohdy and Mahmoud Al-Nsour

In a biological nose, the environment usually suggests a number of common odors. The classification process checks sensed information against existing knowledge. This similarity with Reinforcement Learning neural networks suggests challenging implementation problems. A VLSIC digital design and implementation of a Reinforcement Artificial Neural Network (RANN) for chemical classification, in an electronic nose is presented. The chip is designed to classify chemical gases among four possible volatile organic compounds. The system consists of four neurons and twelve synapses /1]. A neuron has been implemented on a tiny chip, using 2.Oμm n-well CMOS technology, at Orbit Semiconductors, through the MOSIS facilities. Simulation results demonstrated proper operation. Stand alone experiments are satisfactory, with off-chip weight storage and weight update. Electronic nose system testing is currently under way.

A Low Power Charge-Recycling CMOS Clock Buffer [p. 238]
Xiaohui Wang and Wolfgang Porod

A low power CMOS clock buffer based on charge recycling technique is presented. To accomplish the charge recycling process and avoid introducing the extra short circuit current during the recycling phase, an extra switching circuit and control signal are utilized to keep inverters momentarily tri-state. The feasibility of this design and its improved power efficiency are demonstrated by simulations.

A Multiple-Input Single-Phase Clock Flip-Flop Family [p. 240]
Richard F. Hobson and Allan R. Dyck

The design of a versatile CMOS semi-static true single-phase clock flip-flop family is presented. It naturally supports multiple, multiplexed, inputs. Asynchronous Set/Reset are easily implemented. Switching power is lower than for some other semi-static flip-flop techniques.

Methodology of Logic Synthesis for Implementation Using Heterogeneous LUT FPGAs [p. 242]
I. Lemberski

Logic synthesis method for heterogeneous LUT FPGAs implementation is proposed. As an example, XILINX4000 architecture is considered. The method takes XILINX4000 architectural features (heterogeneous LUTs of 3 and 4 inputs) into account and includes two step decomposition. In the first step, two-level logic representation is transformed into a graph of at most 4 fanin nodes (after this step, each node can be mapped onto 4 input LUT). In the second step, selected 4 fanin nodes are re-decomposed into 3 fanin nodes to ensure mapping onto 3 input LUTs. Re-decomposition task is formulated as substituting node two fanins for exactly one fan in.

VHDL Design of a Test Processor Based on Mixed-Mode Test Generation [p. 244]
Md.Altaf-Ul-Amin and Zahari Mohamed Darus

This paper presents the VHDL design of a prototype test processor, which can be used for functional testing of digital ICs. The design of the test processor supports itself to be controlled by a microcomputer. The processor can generate mixed-mode (pseudo-random followed by deterministic) test vectors and can apply them to circuit under test (CUT). The test processor also receives the output responses of the CUT and compresses them to a signature. The signature is then sent to the computer for comparison. The test processor supports the testing of combinational as well as sequential circuits (with scan-path).


Session 7A: Physical Design

An Incremental Floorplanner [p. 248]
Jim Crenshaw, Majid Sarrafzadeh, Prithviraj Banerjee, Pradeep Prabhakaran

One of the foremost problems in physical design for deep-submicron circuits is the need for estimates that depend on future decisions. Estimation of area, timing, and coupling are required. We propose a novel floorplanner, with a new wiring metric which can he updated quickly in small increments. This provides tools with a way to influence the floorplan as they make changes without a large running time penalty. We provide experimental results that show the incremental approach to be generally 5 times faster than full floorplanning while maintaining good estimates.

A Greedy Router with Technology Targetable Output [p. 252]
R. Balakrishnan and R.F. Hobson

Our objective was to integrate an effective channel routing algorithm with the Chip Design Language (CDL) algorithmic layout tool. CDL uses technology targetable layout techniques, so that the output of the routing algorithm can easily be ported to different technologies. We introduce the technology independent features of CDL and describe how a greedy router can be interfaced to it. Specific features of interest include mapping from the grid based router to the gridless CDL environment, and the automatic insertion of CDL feed-through cells in multi -channel applications.

Routability Prediction for Hierarchical FPGAs [p. 256]
Wei Li and D.K Banerji

This paper investigates the problem of routability prediction in a FPGA that employs a hierarchical routing architecture. Such a FPGA is called a hierarchical FPGA(HFPGA). A novel model is proposed to analyze various HFPGA configurations. A software tool has been developed to predict the routability of circuits on specific HFPGA architectures. Primary contribution of this work is that routability prediction can be done immediately after the technology-mapping step, rather than after placement. The effect of connection block and switch block flexibility on routability is also studied. The results show that compared to a symmetrical FPGA architecture, we can achieve the same degree of routability on a HFPGA, with much fewer routing switches.

Memory Unit Design for Real Time DSP Applications [p. 260]
Daniel CHILLET, Olivier SENTIEYS, Michel CORAZZA

Today, the design complexity for new applications (such as telecommunication, multi media, internet), requires new high level tools which enable us to translate - the behavioral description into hardware. All of the recents High Level Synthesis tools are able to transform high level specifications in an ASIC based on processing and control units. In general, these tools do not handle a real optimization of the memory unit. However, in many applications, the hardware solution may be challenged by the number and the complexity of memory units. This paper proposes to complete the synthesis design flow by including the memory unit synthesis. Our methodology is integrated in the BSS (Breizh Synthesis System http: //www. enssat. fr/bss) project which is a framework for the design of real-time constraint applications.


Session 7B: MEMS

Design Automation of MEMS Systems Using Behavioral Modeling [p. 266]
Dennis Gibson, Carla Purdy, Alva Hare, Fred Beyette, Jr.

We propose a behavioral approach to designing MEMS devices. This approach differs from much current research in that this approach would not require dimensional parameters for the device, but instead would require a high level, functional or behavioral description. This paper examines how such an approach would work using a case study of an optical processor manually designed using the MUMPs process.
Keywords: MEMS, CAD for MEMS, behavioral modeling, design automation.

Blending Symbolic Matrix and Dimensional Numerical Simulation Methodology for Mechatronics Systems [p. 270]
Robert L. Ewing

The methodology far the integration of design domains towards the purpose of controlling dynamic mechatronics systems is the current challenge of the modern engineer. Scaling issues for both the mechanical and electrical parameters are critical to the successful design and implementation of a mechatronic system. In approaching time scaling design methodology for future submicron fabrication, new disciplines of symbolic matrix techniques and dimensional analysis must be developed and applied in the design of these mechatronics systems. This paper presents both an overview of the techniques and insight using conmpu er aided design packages for the blending of symbolic matrix techniques using the admittance matrix created by SPICE and dimensional analysis using Buckingham 's II parameters.

Numerical Tools for Fracture of MEMS Devices [p. 274]
N. Tayebi, A.K Tayebi, and Y. Belkacemi

Numerical tools to model fracture in MEMS devices are proposed. The two numerical procedures are the Element Free Galerkin method and the Displacement Discontinuity Method. Experiments on MEMS fracture are used to evaluate the numerical procedures. The test specimens covered a range of geometries and designs, including notches, holes and corners. For some specimens both methods gave acceptable results compared to experiments (Ballarini et al and Suwito), while for others results were off' by more than 15%. These findings raise new questions about the applicability of linear elastic fracture mechanics to model failure of MEMS devices at microscopic scale. Key words: CAD Tools, MEMS, Fracture Mechanics, Meshless Methods, Boundary Element Methods


Session 8A: Verification

Formal Checking of Properties in Complex Systems Using Abstractions [p. 280]
Dinos Moundanos,Jacob A. Abraham

Only very small designs can he verified currently using property checking due to state-space explosion. Abstractions have been developed to simplify the design in an attempt to address this problem. However the properties themselves may involve large state spaces, and practical property checking is generally confined to the control behavior This paper describes an elegant technique for verifying properties of complex designs where the abstraction is applied to both the property and the design, thereby allowing us to verify properties which may deal with the data space. We demonstrate the technique on a processor by checking properties which are intractable using existing model checking techniques.

A Hierarchical Approach to the Formal Verification of Embedded Systems Using MDGs [p. 284]
Subhashini Balakrishnan and Sofiène Tahar

With the increasing emergence of mixed hardware/software systems, it is important to ensure the correctness of such a system formally, particularly for real-time and safety critical applications. We present a hierarchical approach to modeling and formally verifying an embedded system at higher levels of abstraction, using Multiway Decision Graphs (MDGs). We demonstrate our approach on the embedded software for a mouse controller application on a commercial microcontroller (PlC I6C7l), using the MDG verification tools. Inconsistencies in the assembly code with respect to tile specification, as published in the application notes of the manufacturer, were uncovered through our experiments.

Symbolic Multi-Level Verification of Refinement [p. 288]
Stefan Hendricx, Luc Claesen

VLSI-system design can, in general, be characterized in terms of the step-wise refinement of intermediate solutions. Despite the fact that such refinements usually do not preserve time-scales, current formal verification approaches mostly start from the assumption that both specification and implementation utilize the same scales of time. Realizing the importance of being able to cope with differences in timing granularity, this preliminary paper proposes a symbolic methodology to verify that a low-level finite state machine is a refinement of a high-level finite state machine. To illustrate our approach, the step-wise refinement - and verification --- of a simple microprocessor is presented.

Self-Checking of FGPA-Based Control Units [p. 292]
Ilya Levin and Vladimir Sinelnikov

The paper introduces a new technique for on-line checking of FPGA based Control Units (CUs). This technique is based on the architecture comprising two portions: a self-checking CU and a separate totally self-checking (TSC) checker, Each of these portions is implemented as a combination of an Evolution block and an Execution block. Comparison of code vectors being transferred between the blocks of the portions enables providing a totally self-checking property. The self-checking CU is implemented in a form of one-rail network of interconnected pre-designed LUT-based configurable logical blocks. The self-checking checker is a Sum-Of-Minterms based checker. The proposed technique: a) does not require any encoding of output words; b) uses one-rail design, thereby drastically decreasing the required overhead.

A Software Acceptance Testing Technique Based on Knowledge Accumulation [p. 296]
Yi Yu, Fangmei Wu

System acceptance testing in general relies on the specification of system requirements, but for complex systems, especially for complex safety systems, the issue whether system requirements specified by users are complete should be considered. This paper presents a software acceptance testing technique based on knowledge accumulation, which can help to expose the software faults caused by the lack of knowledge. A software test tool using the technique for the railway signaling computer interlocking systems and some tested results are also introduced in this paper.
Keywords software acceptance testing, knowledge accumulation, railway signaling, interlocking

A Correlation Matrix Method of Clock Partitioning for Sequential Circuit Testability [p. 300]
Yong Chang Kim, Vishwani D. Agrawal, Kewal K. Saluja

We propose a method of partitioning the set of all flip-flops in a circuit for multiple clock testing. In the multiple clock testing, flip-flops are partitioned into different groups and each group of flip-flops has an independent clock control. In our method, we use a test generator assuming an independent clock control for each flip-flop. We than determine correlation between clock activity for all pairs of flip-flops. This information is than used to an optimal or near optimal partition of flip-flops in. Through experiments, we demonstrate that our partitioning method increases fault coverage and reduces test length with almost no hardware overhead or performance penalty.


Session 8B: VLSI Circuits 2

A Novel Low Power Low Phase-Noise PLL Architecture for Wireless Transceivers [p. 306]
Amr N. Hafez and M.I. Elmasry

A sample- (2nd-hold stage placed in the feedback path of a PLL frequency synthesizer reduces the division ratio, and hence the phase-detector phase-noise, without the need of multiple loops. When used in conjunction with a DDS, this architecture simplifies the DDS design leading to a low-power architecture. Furthermore, this architecture allows for a large loop bandwidth thus sup- pressing the VCO phase-noise. The advantages of this architecture are highlighted and system- and circuit- level simulations presented.

NMOS Energy Recovery Logic [p. 310]
Chulwoo Kim, Seung-Moon Yoo, and Sung-Mo Kang

In this paper, we describe NMOS Energy Recovery Logic (NERL) which exhibits high throughput with low energy consumption due to efficient energy transfer and recovery using adiabatic and bootstrapping. NERL shows full output voltage swing, insensitivity to output load capacitance, less dependency on power-clock frequency and complementary outputs for balanced capacitance load m.o power-clock. We have designed an 8-bit CLA amid inverter drain using 0.6μm CMOS technology and verified that NERL saves energy over ECRL by 2 to 3 times.

Noise Immunity of Digital Circuits in Mixed-Signal Smart Power Systems [p. 314]
Radu M. Secareanu, Ivan S. Kourtev, Juan Becerra, Thomas E. Watrobski, Christopher Morton, William Staub, Thomas Tellier, and Eby G. Friedman

Experimental data describing circuit and physical design issues that influence the noise immunity of digital latches in mixed-signal smart power circuits are described and discussed. The principal result of this paper is the characterization of the conditions under which substrate noise generated by high power analog circuitry affects digital latches. The experimental data characterize a variety of different noise mitigation techniques for the particular process technology, circuit structures, signal/clocking interdependencies, and related conditions.

An All Digital BiCMOS Phase Lock Loop for VLSI Processors [p. 318]
Lim Chu Aun and S.M.Rezaul Hasan

A BicMOS all digital phase lock loop is described. This design is suitable for applications such as clock and frequency synthesis in VLSI processors where thermal stability is an important factor. The main block o/ the design consists of a digital/v controlled oscillator with wide frequency range & high thermal stability compared to CMOS design. Improved BiCMOS adder/subtractor was also implemented to reduce worst- case propagation delay-time. A small test chip was fabricated using MOSIS Orbit 2μm low-cost analog CMOS process technology that provides lateral NPN bipolar device option.

Low Power Techniques for Digital GaAs VLSI [p. 321]
J. F. Lopez, R. Sarmiento, A. Núñez, K Eshraghian, S. Lachowicz, and D. Abbott

This paper presents a survey of low-power digital Galhum Arsenide logic applicable to high performance VLSI circuits and system.s and proposes new design concepts in methodology and architecture based on implementation of Pseudo-Dynamic Latched Logic in order to achieve reasonable power-delay-area tradeoff' The approach is highly suitable far self-timed systems where the complexities of clock skew are avoided and power saving is achieved through pipelined architectures. The emergence of low- power Complementary HIGFET (C-HIGFET) technology enables the realisation of new high performance low-power architectures. The viability of neu-GaAs (vGaAs) as applied to C-HIGFET is discussed and the concept of soft' hardware referred as 'flexware' is introduced as a new design paradigm far GaAs.

A VLSI Architecture for ATM Switches with Algorithm-Agile Encryption [p. 325]
A. G. Wassal and M.A. Hasan

In this paper a VLSI architecture is proposed for an algorithm-agile encryptor for ATM networks. The architecture is based on a circular sorting queue that buffers and switches incoming cells to the appropriate encryption pipelines. It also handles multicast cells that require different encryption algorithms for different destinations. Delay and loss priority are analyzed for multi-class traffic processed through the encryptor. The analysis results are necessary to size the buffer properly and to choose an appropriate priority scheme. An ASIC prototype of the sorting queue that supports an aggregate traffic rate of up to 21.2 Gbps is also presented.


Session 8C: Short Papers 2

On an Efficient Method for Estimating the Interconnection Complexity of Designs and on the Existence of Region III in Rent's Rule [p. 330]
Dirk Stroobandt

The interconnection complexity of digital designs can be captured by the well-known Rent exponent, described by Landman and Russo [2]. In this paper, we present an efficient method for obtaining the Rent exponent of a design through a hierarchical partitioning algorithm. Experimental results not only confirm the Landman and Russo observations of a region land region II, but also show a hitherto unknown region III

Monolithic Microprocessor and RF Transceiver in 0.25-micron FDSOI CMOS [p. 332]
E. McShane, K Shenai, L. Alkalai, E. Kowala, V. Boyadzhyan, B. Blaes, and W.C. Fang

A monolithic RFIC in 0.25-micron fully-depleted SOI CMOS has been designed consisting of a microcoded 8-bit 33-MHz microprocessor, a 400-MHz 8-bit ASK-modulated RF transceiver, and two integrated dc-dc voltage converters for power management. This architecture exploits a low-power (sub 2- V) digital process for mixed-signal VLSI in a die size measuring 2.2 mm x 2.2 mm.

Low Power Design of an Acoustic Echo Canceller Gmdf&alph; Algorithm on Dedicated VLSI Architectures [p. 334]
S. Gailhard, N. Julien, A. Baganne, and E. Martin

The acoustic echo cancellation with adaptive filters is a computationally intensive problem that needs real time cost effective solutions for embedded systems. Low Power optimized signal processing architectures are likely to provide such solutions in the future. In this paper, we present different realtime optimized architectures of the popular Gmdfα algorithm, obtained by a HLS CAD tool providing trade-off between area and power dissipation.

Proposal of Data-Driven Processor Architecture Qv-K1 [p. 336]
Teruhiko Kamigata, Koso Murakami, Makoto Iwata, Hiroaki Terada

This paper presents an extended SIMD form data operation for multi-media signal processing and a performance evaluation of data-driven processor Qv-K1. By appending proposed data-parallel operation mechanism, the number of executed instructions is reduced than the one of SIMD. So, the processing ability of this processor could be risen.

Accurate Resource Estimation Algorithms for Behavioral Systems [p. 338]
Srinivas Katkoori, Ranga Vemuri

Given a scheduled data flow graph the functional, storage, and interconnect (multiplexors) resources are analytically estimated taking into account the effects of post-scheduling tasks. Complexity of the controller implementation is also estimated. The novelty of this work lies in predicting the effects of the post-scheduling tasks on the final amount of resources, the effects of data path~ resource optimization on the controller complexity. Experimental results show high correlation between estimated and actual numbers.

Assessing Defect Coverage of Memory Test Algorithms [p. 340]
Vonkyoung Kim and Tom Chen

This paper describes the defect coverage evaluation of memory testing algorithms. Realistic CMOS defects were extracted from a 2 x 2 SRAM layout using an IFA tool, and circuit simulations were performed to measure the defect coverages of the eleven memory testing algorithms

Exploiting Test Resource Optimization in Data Path Synthesis for BIST [p. 342]
Xiaowei Li, Paul Y.S.Cheung

Area and test time are two major overheads encountered during data path synthesis for BIST. This paper presents an attempt towards testability enhancement in data path BIST synthesis by considering two factors simultaneously. It is achieved by incorporating two testability constraints in data path synthesis. Experimental results are presented to demonstrate the effectiveness of the proposed (data path) BIST synthesis approach.

Resonant Tunneling Transistors for Threshold Logic Circuit Applications [p. 344]
C. Pacha, P. Glösekotter, K Goser, U. Auer, W. Prost, and F. -J. Tegude

Resonant tunneling transistors (RTT's) and linear threshold gates based on monostable-bistable logic transition elements (MOBILE's) are promising candidates for nano-scale integrated circuits. In this paper the design methodology of RTT logic gates is discussed and experimental results of a monolithically integrated NAND -NOR gate are presented. To exploit the computational functionality of threshold logic circuits a depth-2 full adder and a bit-level pipelined ripple carry adder are proposed.

A Multilevel Cache Memory Architecture for Nanoelectronics [p. 346]
David Crawley

In this paper, we present a new multilevel cache memory architecture which uses only near-neighbour connections, thus eliminating long tracks and rendering the system suitable for nanoelectronic implementation. Operation of the memory is such that the most-recently accessed data is kept closest to the read-write port.


Session 9A: Low Power

ALPS: A Peak Power Estimation Tool for Sequential Circuits [p. 350]
F. Corno, M. Rebaudengo, M.Sonza Reorda, and M. Violante

Tools for evaluating the worst-case peak power consumption of sequential circuits are highly useful to designers of low-power circuits. Previously proposed methods search for the initial state and the couple of vectors with maximum consumption, without fully considering the reachability of the initial state. This paper shows that this approach can lead to a significant underestimation of the maximum peak power consumption, and proposes a new algorithm that overcomes this drawback. Experimental results show that for many circuits the algorithm is able to provide better results than those known up to now, while an approximate version is able to deal even with the largest benchmark circuits.

Clustered Table-Based Macromodels for RTL Power Estimation [p. 354]
Roberto Corgnati, Enrico Macii, Massimo Poncino

Macromodeling is considered the most effective approach to RTL power estimation. Among the macromodels presented in the literature, table-based ones have overcome some of the limitations of conventional, equation-based solutions. In this paper we propose some enhancements to the basic implementation of table-based macromodels that improve the estimation accuracy while preserving the intrinsic robustness.

The Design of a CMOS Gigahertz-Band Continuous-Time Active Lowpass Filters with Q-Enhancement Circuits [p. 358]
Yuyu Chang, John Choma, Jr., and Jack Wills

A tunable second-order lowpass filter architecture capable of operating in the gigahertz frequency range is proposed. Two Q-enhancement techniques are utilized to extend the Q tuning range. Simulation results employing standard 0.5μm CMOS technology have successfully verified that the center frequency tuning and the hybrid Q-tuning approach operate between 1.26GHz and 2.3GHz center frequencies with Q larger than 1000. A tunable lowpass filter with a center frequency at 2.07GHz with a Q equal to 31 is designed to have 44dB input dynamic range and 27.8 mW power dissipation.

A New Algorithm for RNS Magnitude Comparison Based on New Chinese Remainder Theorem II [p. 362]
Yuke Wang, Xiaoyu Song, Mostapha Aboulhamid

The number comparison is a difficult and fundamental operation for residue number systems (RNS). Previous algorithms use either some redundant modulus or big modulo operations. In this paper, based on the New Chinese Remainder Theorem II, we present a new comparison algorithm using smaller modulo operations and no redundant modulus.


Session 9B: VLSI Circuits 3

Low Power Chip Interface Based on Bus Data Encoding with Adaptive Code-Book Method [p. 368]
Satoshi Komatsu, Makoto Ikeda, Kunihiro Asada

An adaptive code-book encoding is proposed, which is applicable for low power chip-interface. In this method, data transition activity on bus signals is lowered by data encoding similar to the vector quantization (VQ). Transferred data on bus are the quantized vector numbers along with the Hamming difference between the original data and the quantized vector. A computer simulation and measurement results show that this encoding method is effective for low power chip-interface especially for the deep sub-micron VLSIs.

A 1.8V High Dynamic-Range CMOS High-Speed Four Quadrant Multiplier [p. 372]
Chi-Huing Lin and Mohannnsetl Ismail

A low-voltage (<3V) CMOS four quadrant multiplier is introduced which has an almost rail-to-rail differential-input-swing with a low signal-distortion (<1% for 100kHz signal). The proposed circuit is composed of a pair of rail-to-rail differential-input V-I converters and a pair of voltage-followers. This topology of multiplier results in a high frequency capability with low power consumption. In a 1.2μm n-well CMOS process, the 3dB frequency of the multiplier is in a range of 103MHz. Measured total power consumption is around 0.52mW with supply voltage 2V. The multiplier can operate at a minimum supply voltage of 1.8V.

A Second-Order Sigma-Delta Modulator with Built-in VGA to Improve SNR and Harmonic Distortion [p. 376]
Xiaopeng Li and Mohammed Ismail

A modified architecture of the second-order switched-capacitor modulator is proposed. A simple four-transistor variable gain attenuator is included in the architecture which continuously adjusts the reference voltage of the quantizer feedback. This improves the output SNR for small signal input and reduces the harmonic distortion for large signal input. Simulation results show that it achieves higher dynamic range and lower harmonic distortions compared with the traditional architecture.

A Novel Low Power Energy Recovery Full Adder Cell [p. 380]
R. Shalem, E. John, and L. K John

A novel low power and low transistor count static energy recovery full adder (SERF,) is presented in this paper. The power consumption and general characteristics of theSERF adder are then compared against three low power full adders, the transmission function adder (TFA), time dual value logic (DVL) adder and the fourteen transistor (14T) full adder. Time proposed SERF adder design was proven to be superior to the other three designs in power dissipation and area, and second in propagation delay only to the DVL adder. The combination of low power and low transistor count makes the new SERF cell a viable option for low power design.

Memory Chip BIST Architecture [p. 384]
Jacob Savir

This paper describes a random access memory (RAM, sometimes also called an array) test scheme that has the following attributes:
1. Can be used in both built-in mode and off chip/module mode.
2. Can be used to test and diagnose naked arrays.
3. Fault diagnosis is simple and is "free" for some faults during test.
4. It never subject to aliasing.
5, Depending upon the test length, it earn detect many kinds of failures, like stuck-cells, decoder faults, shorts, pattern-sensitive, etc.
6. If used as built-in feature, it does not slow down the normal operation of the array. 7. Does not require storage of correct responses. A single response hit always indicates whether a fault has been detected. Thus, Thus storage requirement for the implementation of the test scheme is zero. 8. If used as a built-in feature, the hardware overhead is very low.

A Fully Pipelined, 700MBytes/s DES Encryption Core [p. 386]
Ihn Kim, Craig S. Steele, Jefferey G. Koller

Fully-pipelined, 56-bit DES de/encryption and authentication at memory-bus bandwidths is now feasible. We describe a custom, 7 square mm, 120mW core in 4-metal 0.35μm CMOS. Performance allows on-the-fly encryption of 64-bit, 66MHz PCI traffic, and hence typical network traffic. FPGA, synthesized, and 3-metal versions are compared.

Transistor Stuck-Open Fault Detection in Multilevel CMOS Circuits [p. 388]
Mostafa Abd-El-Barr, Yanging Xu and Carl McCrosky

The necessary and sufficient conditions for detecting transistor stuck-open faults in arbitrary multi-level CMOS circuits are shown. A method for representing a two-pattern test for detecting a single stuck-open fault using only one cube is presented. The relationship between the D-algorithm and the conditions for detecting transistor stuck-open faults in CMOS circuits is provided. The application of the proposed approach in robust test generation for transistor stuck-open faults in a number of benchmark circuits is demonstrated. The fault coverage achieved is as good as or better than those reported using existing techniques.
keywords Transistor stuck-open fault, two-pattern test, test pattern generation, multi-level CMOS circuits testing, robust CMOS testing.

Advances Toward Molecular-Scale Electronic Digital Logic Circuits: A Review and Prospectus-Abstract [p. 392]
James C. Ellenbogen

(Extended Abstract)

Transport in Split Gate MOS Quantum Dot Structures [p. 394]
S. M. Goodnick, J. Bird, D. K Ferry, A. D. Gunther, M. D. Khoury, M. Kozicki, M. J. Rack, T. J. Thornton, and D. Vasileska-Kafedezka

A novel technique has been developed for the fabrication of Si quantum dot structures with controllable electron number through both top and side gates. We have tested devices ranging in size from 40 to 200nm. By varying the density with the top gate, amid controlling the input and output barriers of the dot with the side gates, conductance peaks are observed which map details of the energy level within the dot as well as the interaction of the electrons with one another.