|
ISLPED 2002 ABSTRACTS
Sessions:
[Keynote]
[1]
[2]
[Poster Session 1]
[Poster Session 2]
[3]
[4]
[5]
[6]
[Invited Talk]
[7]
[8]
[Embedded Tutorial 1]
[Embedded Tutorial 2]
[9]
[10]
[Poster Session 3]
[Poster Session 4]
[Invited Talk]
[11]
[12]
Session Chair: Mary Jane Irwin (Penn State University)
-
Low-Voltage Memories for Power Aware Systems [p. 1]
-
Kiyoo Itoh (Hitachi Ltd.)
This paper describes low-voltage RAM designs for stand-alone
and embedded memories in terms of signal-to-noise-ratio designs
of RAM cells and subthreshold-current reduction. First,
structures and areas of current DRAM and SRAM cells are
discussed. Next, low-voltage peripheral circuits that have been
proposed so far are reviewed with focus on subthreshold-current
reduction, speed variation, on-chip voltage conversion, and
testing. Finally, based on the above discussion, a perspective is
given with emphasis on needs for high-speed simple non-volatile
RAMs, new devices/circuits for reducing active-mode leakage
currents, and memory-rich SOC architectures.
Keywords
subthreshold current, DRAM and SRAM cells, gain cells,
peripheral circuits, gate-source/substrate-source back-biasing,
multi-VT , on-chip voltage converters, testing, non-volatile RAMs,
memory-rich architectures.
Session Chair: Nestoras Tzartzanis (CSEM, EPFL, TPC)
Session Organizer: Bill Athas (Apple)
-
1.1 Standby Power Management for a 0.18 um Microprocessor [p. 7]
-
L.T. Clark, S. Demmons, N. Deutscher, F. Ricci (Intel Corporation)
Static power dissipation is a concern for battery powered handheld
devices since it can substantially impact the battery life.
Here, the use of reverse body bias to limit Ioff on the high
performance, low power XScaleTM microprocessor core is
described. The scheme utilized is amenable to implementation on
a low-cost (non-triple well) process and has limited regulation
requirements. The regulation requirements and circuits are
described, as is the performance of the method. A measured
current reduction factor of over 25 is achieved with this method
of reverse body bias. Implications of the use of body bias leakage
control for active power and performance, as well as system level
implications are also discussed.
General Terms
Measurement, Performance, Design
Keywords
Low power, microprocessors, body effect
-
1.2 Physical Insight into Fractional Power Dependence of Saturation
Current on Gate Voltage in Advanced Short Channel MOSFETs [p. 13]
-
H. Im (University of Tokyo, Dongguk University), M. Song (Dongguk University),
T. Hiramoto (University of Tokyo, VLSI Design and Education Center),
T. Sakurai (University of Tokyo)
The physical origin of the fractional power dependence of MOSFET drain
current on gate voltage, namely a-power law model that has been considered
as a fully empirical model, is analytically investigated. For this purpose,
we have developed a new physics-based analytical drain current model.
Using this model, we prove that the saturation current can be simplified
in the form of B.(Vg-VTH)alpha, alpha-power law model. The physical interpretations
on alpha, B, VTH are elucidated, and their analytical expressions
are given in
terms of MOSFET's parameters. Since the a-power model is compact and
physics-based, it allows circuit designers to easily estimate the power
dissipation and the gate delay time in a predictable manner.
Categories & Subject Descriptors: I.6.5 Model Development.
General Term: Theory, Verification.
Keywords: MOSFET modeling, Saturation current, alpha-power model.
-
1.3 Full-chip Sub-threshold Leakage Power Prediction Model for sub-0.18um CMOS [p. 19]
-
S. Narendra (Massachusetts Institute of Technology, Intel Laboratories), V. De, S. Borkar (Intel Laboratories), D. Antoniadis, A. Chandrakasan (Massachusetts
Institute of Technology)
The driving force for the semiconductor industry growth has been
the elegant scaling nature of CMOS technology. In future CMOS
technology generations, supply and threshold voltages will have
to continually scale to sustain performance increase, control
switching power dissipation, and maintain reliability. These
continual scaling requirements on supply and threshold voltages
pose several technology and circuit design challenges. With
threshold voltage scaling sub-threshold leakage power is expected
to become a significant portion of the total power in future CMOS
systems. Therefore, it becomes crucial to predict sub-threshold
leakage power of such systems. In this paper, we present a subthreshold
leakage power prediction model that takes into account
within-die threshold voltage variation. Statistical measurements of
32-bit microprocessors in 0.18 µm CMOS confirms that the mean
error of the model to be 4%. Comparisons of this model to two
other existing models that do not take within-die threshold voltage
variation into account are also presented.
-
1.4 Power-Conscious Interconnect Buffer Optimization with Improved
Modeling of Driver MOSFET and its Implications to Bulk and SOI
CMOS Technology [p. 24]
-
K. Nose, T. Sakurai (University of Tokyo)
Closed-form formulas for optimum buffer insertion where the
junction capacitance is taken into account are proposed. In order
to use the derived formulas, an appropriate choice of the effective
linear resistance of the driving transistor is also clarified. Using
the proposed formulas, the optimum interconnect delay and
power comparison between bulk and SOI CMOS technology are
discussed. The calculation results show that both the optimum
delay and power with SOI can be reduced by 15% compared with
the bulk MOSFET whose junction capacitance is assumed to be
equal to the gate capacitance.
Categories & Subject / General Terms
B.7.1 Integrated circuits / Performance, design
Session Chair: Teresa Meng (Stanford University)
Session Organizer: Vijaykrishnan Narayanan (Penn State University)
-
2.1 E2WFQ: An Energy-Efficient Fair Scheduling Policy for Wireless
Systems [p. 30]
-
V. Raghunathan, S. Ganeriwal, C. Schurgers, M. Srivastava (University of
California, Los Angeles)
As embedded systems are being networked, often wirelessly, an
increasingly larger share of their total energy budget is due to the
communication. This necessitates the development of power management
techniques that address communication subsystems, such
as radios, as opposed to computation subsystems, such as embedded
processors, to which most of the research effort thus far has
been devoted. In this paper, we present E2WFQ, an energy efficient
version of the Weighted Fair Queuing (WFQ) algorithm for
packet scheduling in communication systems. We employ a recently
proposed radio power management technique, Dynamic Modulation
Scaling (DMS), as a control knob to enable energy-latency
tradeoffs during wireless packet scheduling. The use of E2WFQ
results in an energy aware packet scheduler, which exploits the
statistics of the input arrival pattern as well as the variability in
packet lengths. Simulation results show that large savings in energy
consumption can be obtained through the use of our scheduling
scheme, compared to conventional WFQ, with only a small,
bounded increase in worst case packet latency.
Categories and Subject Descriptors
C.2.1 [Computer-Communication Networks]: Network Architecture
and DesignÖNetwork communications, Wireless communication;
C.2.6 [Computer-Communication Networks]: Internetworking
ÖRouters
General Terms
Algorithms, Design
Keywords
Energy Efficient Design, Power Management, Wireless Communications,
Fair Scheduling
-
2.2 A Framework for Energy-Scalable Communication in High-Density
Wireless Networks [p. 36]
-
R. Min, A. Chandrakasan (Massachusetts Institute of Technology)
Power-aware communication is essential for maximizing the lifetime
of energy-constrained wireless devices. Applications running
on such devices can cooperatively reduce communication energy
by trading communication latency, reliability, or range for energy
savings. We introduce a framework that exposes these high level
trade-offs to a power-aware communication subsystem featuring
variable-strength convolutional coding, an adjustable power amplifier,
and a voltage-scaled processor. An application programming
interface (API) exposes an application's minimum quality constraints
on the communication. These constraints are translated
into energy-efficient parameter settings for the communication
hardware. We apply our framework to improved communication
energy models and measurements from a wireless microsensor
node to effect over an order of magnitude of energy scalability.
Categories and Subject Descriptors
C.2.1 [Network Architecture and Design] Ö
wireless communication, network communications
. C.2.3 [Network Operations] C.4 [PERFORMANCE OF SYSTEMS]. D.2.2 [Design Tools
and Techniques]
General Terms
Performance, Design, Reliability
Keywords
power awareness, energy scalability, wireless sensor networks, distributed
microsensors,
uAMPS, API design, dynamic voltage scaling,
forward error correction, transmit power, macromodels,
energy models
-
2.3 Contents Provider-Assisted Dynamic Voltage Scaling for Low
Energy Multimedia Applications [p. 42]
-
E.-Y. Chung (CSL Stanford University), L. Benini (University of Bologna),
G. De Micheli (CSL Stanford University)
This paper presents a new concept of DVS (Dynamic Voltage
Scaling) for multimedia applications. Many multimedia
applications have a periodic property, but each period
shows a large variation in terms of its execution time. Exact
estimation of such variation is a crucial factor for low energy
software execution with DVS technique. Previous DVS
techniques focused only on end users (client sites) and their
quality heavily depends on the accurateness of the worst case
execution time estimation. This paper proposes that contents
providers (server sites) supply the information of the
execution time variations in addition to the content itself.
This makes it possible to perform DVS independent to worst
case execution time estimation. The extra work required to
the contents provider for this purpose is fully compensated
by the benefits for the end users because single content is
often provided to many users. Experimental results show
that our method greatly reduces the energy consumption of
client systems compared to previous DVS techniques.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering
General Terms
Algorithms, Management
Keywords
DVS(Dynamic Voltage Scaling), contents provider, low-power,
worst case execution time, characterization, multimedia
Session Chair: R.V. Joshi (IBM), Lars Svensson (Chalmers University)
-
P1.1 Low-Leakage Asymmetric-Cell SRAM [p. 48]
-
N. Azizi, A. Moshovos, F.N. Najm (University of Toronto)
We introduce a novel family of asymmetric dual-Vt SRAM
cell designs that reduce leakage power in caches while maintaining
low access latency. Our designs exploit the strong
bias towards zero at the bit level exhibited by the memory
value stream of ordinary programs. Compared to conventional
symmetric high-performance cells, our cells offer significant
leakage reduction in the zero state and in some cases
also in the one state albeit to a lesser extend. A novel
sense-amplifier, in coordination with dummy bitlines, allows for
read times to be on par with conventional symmetric cells.
With one cell design, leakage is reduced by 7X (in the zero
state) with no performance degradation. An alternative cell
design reduces leakage by 40X (in the zero state) with a performance
degradation of 5%.
Categories and Subject Descriptors
B.3.1 [Memory Structures]: Semiconductor memories
General Terms
Design
Keywords
SRAM, Low-leakage, Low-power, Dual-Vt
-
P1.2 Managing Leakage for Transient Data: Decay and Quasi-Static
4T Memory Cells [p. 52]
-
Z. Hu, P. Juang (Princeton University), P. Diodato, S. Kaxiras (Agere Systems),
K. Skadron (University of Virginia), M. Martonosi, D. W. Clark
(Princeton University)
Much of on-chip storage is devoted to transient, often short-lived,
data. Despite this, virtually all on-chip array structures use six transistor
(6T) static RAM cells that store data indefinitely. In this
paper we propose the use of quasi-static four-transistor (4T) RAM
cells. Quasi-static 4T cells provide both energy and area savings.
These cells have no connection to Vdd and thus inherently provide
decay functionality: values are refreshed upon access but discharge
over time without use. This makes 4T cells uniquely well-suited
for predictive structures like branch predictors and BTBs where
data integrity is not essential. We use quantitative evaluations (both
circuit-level and cycle-level) to explore the design space and quantify
the opportunities. Overall, 4T-based branch predictors offer
12-33% area savings and 60-80% leakage savings with minimal
performance impact. More broadly, this paper suggests a new view
of how to support transient data in power-aware processors.
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated CircuitsÖTypes and Design Styles
General Terms
Design, Measurement
Keywords
Leakage power, transient data, decay, quasi-static, 4T, memory cell
-
P1.3 Conditional Pre-Charge Techniques for Power-Efficient Dual-Edge Clocking [p. 56]
-
N. Nedovic, M. Aleksic, V.G. Oklobdzija (University of California, Davis)
A new dual edge-triggered flip-flop that saves power by inhibiting
transitions of the nodes that are not used to change the state is
presented. The proposed flip-flop is 12% faster with 10% lower
Energy-Delay Product for 50% data activity, as compared to the
previously published dual edge-triggered storage elements. This
was confirmed by simulation using 0.18um process, 1.8V power
supply, and clock frequency of 250MHz. This flip-flop is
particularly suitable for low-power applications.
Categories and Subject Descriptors
B.6.1 [Logic Design]: Design Styles sequential circuits.
General Terms
Performance, Design.
Keywords
Dual edge-triggered flip-flop, clocked storage elements, clocking,
clock distribution, power consumption.
-
P1.4 Circuit-Level Techniques to Control Gate Leakage for sub-100nm CMOS [p. 60]
-
F. Hamzaoglu, M.R. Stan (University of Virginia)
Although still negligible for state-of-the-art CMOS, gate leakage
will become significant in the future for sub-100nm technologies,
due to the scaling of oxide thickness. We propose several circuit
techniques to control gate leakage based on the fact that PMOS
transistors with SiO2 gate oxide have an order of magnitude
smaller gate leakage than NMOS transistors in the same
technology. First, we compare n-type domino with p-type domino
circuits in terms of performance, leakage and switching power,
and explore the different tradeoffs between performance and
power. Second, we compare n-type with p-type gating for
MTCMOS to control the leakage during sleep. The proposed
circuits are simulated for a predictive 70nm CMOS technology
with 10Ŗ gate oxide thickness and 1.2V supply voltage.
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated Circuits types and design styles.
General Terms
Algorithms, Performance, Design, Reliability.
Keywords
Gate leakage, low power, domino circuits, MTCMOS.
-
P1.5 Modeling and Analysis of Leakage Power Considering Within-Die
Process Variation [p. 64]
-
A. Srivastava, R. Bai, D. Blaauw, D. Sylvester (University of Michigan)
We describe the impact of process variation on leakage power for a
0.18µm CMOS technology. We show that variability, manifested
in Ldrawn, Tox, and Nsub, can drastically affect the leakage current.
We first present Monte Carlo-based simulation results for leakage
current in various CMOS gates when the process parameters are
varied both individually and concurrently. We then derive an
analytical model to estimate the mean and standard deviation of the
leakage current as a function of the process parameter distributions.
We demonstrate that the results of the analytical model match well
with Monte-Carlo simulations and also show the statistical mean
leakage current is significantly different from the leakage predicted
using a nominal case file.
Session Chair: Vamsi Krishna (Agilent Technologies)
-
P2.1 Low-Power Approach for Decoding Convolutional Codes with
Adaptive Viterbi Algorithm Approximations [p. 68]
-
R. Henning, C. Chakrabarti (Arizona State University)
Significant power reduction can be achieved by exploiting real time
variation in system characteristics while decoding
convolutional codes. The approach proposed herein adaptively
approximates Viterbi decoding by varying truncation length and
pruning threshold of the T-algorithm while employing trace-back
memory management. Adaptation is performed according to
variations in signal-to-noise ratio, code rate, and maximum
acceptable bit error rate. Potential energy reduction of 70 to
97.5% compared to Viterbi decoding is demonstrated.
Superiority of adaptive T-algorithm decoding compared to fixed
T-algorithm decoding is studied. General conclusions about when
applications can particularly benefit from this approach are given.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]: Signal
processing systems.
General Terms
Algorithms, Performance, Experimentation.
Keywords
Low Power, Viterbi Algorithm, Adaptive T-algorithm Decoding,
Convolutional Codes.
-
P2.2 Power-Aware Source Routing Protocol for Mobile Ad Hoc Networks [p. 72]
-
M. Maleki, K. Dantu, M. Pedram (University of Southern California)
Ad hoc wireless networks are power constrained
since nodes operate with limited battery energy. To maximize the
lifetime of these networks (defined by the condition that a fixed
percentage of the nodes in the network "die out" due to lack of
energy), network-related transactions through each mobile node
must be controlled such that the power dissipation rates of all
nodes are nearly the same. Assuming that all nodes start with a
finite amount of battery capacity and that the energy dissipation
per bit of data and control packet transmission or reception is
known, this paper presents a new source-initiated (on-demand)
routing protocol for mobile ad hoc networks that increases the
network lifetime. Simulation results show that the proposed
power-aware source routing protocol has a higher performance
than other source initiated routing protocols in terms of the
network lifetime.
Categories and Subject Descriptor
C.2.2 [Computer-Systems Organization]:Network Protocols
General Terms
Algorithms
-
P2.3 Analyzing Energy Friendly Steady State Phases of Dynamic
Application Execution in Terms of Sparse Data Structures [p. 76]
-
E. G. Daylight (IMEC vzw, Katholieke University Leuven), S. Wuytack, C.
Ykman-Couvreur (IMEC vzw), F. Catthoor (IMEC vzw, Katholieke University Leuven)
In the past decades, data structure analysis was mainly done at a
high level of abstraction in the computer science community. For
instance, choosing a linked list as a data structure as opposed to an
array for a specific situation, was mainly motivated from a performance
point of view under the implicit assumption that the computer
platform (that had to run the software) consisted out of one
monolithical, physical memory. In the context of mobile, embedded
devices, energy consumption is as important as performance.
In addition to this, the assumption of one monolithical memory is
outdated for many (if not all) current-day platforms! Clearly, there
is a need to improve the choices that are made during data structure
analysis given specific knowledge of the memory hierarchy of the
platform under investigation.
We show how memory related energy consumption can heavily
be reduced by taking into account the access behaviour of the application
on the one hand and the available on-chip and off-chip memory
space on the other hand. We do this by exploiting the sparseness
that is present in one steady state of the data structure under
investigation. Analytical results show that energy reductions of a
factor of 8.7 are feasible in comparison to common data structure
implementations. We trade these gains off with on-chip memory
space consumption of a custom memory architecture.
Categories and Subject Descriptors
E.2 [Data Storage Representations]: [composite structures, linked
representations]; C.4 [Computer Systems Organization]: Performance
of SystemsÖperformance attributes
Keywords
Energy consumption, on-chip memory footprint, partitioned data
structure
-
P2.4 Odd/Even Bus Invert with Two-Phase Transfer for Buses with
Coupling [p. 80]
-
Y. Zhang, J. Lach, K. Skadron, M.R. Stan (University of Virginia)
The coupling capacitances between on-chip bus lines become
dominant in deep-submicron technologies. Coding to reduce the
switching activity of the individual lines was enough to reduce
power on buses in older technologies, but new coding techniques
that reduce the coupling activity between lines are needed for
deep-submicron buses. One such coding technique uses the
simple observation that coupling capacitances are always charged
and discharged by activity on neighboring bus lines, where one
line has an odd number and the other has an even number (if bus
lines are numbered in-order). We thus propose to reduce the
coupling activity by independently controlling the odd and even
bus lines with two separate lines, the Odd Invert, and Even Invert
line, respectively. We obtain significant reductions in power
simply by comparing the coupling activity for the four possible
cases of the Odd and Even Invert lines (00, 01, 10, 11), and then
choosing the value with the smallest coupling activity to transmit
on the bus. Even after encoding, the coupling activity for a pair of
bus lines is still strongly dependent on the data. In particular the
toggling sequences 01R10 and 10R01 result in 4 times more
coupling energy dissipation than other coupling events. We thus
propose a targeted Two-Phase transfer in order to reduce total
power only on the pairs of lines that carry such toggling events.
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated Circuits types and design styles.
General Terms
Algorithms, Performance, Design.
Keywords
Coding for low-power I/O, Bus Invert, buses with coupling.
-
P2.5 An Intra-Task Dynamic Voltage Scaling Method for SoC Design
with Hierarchical FSM and Synchronous Dataflow Model [p. 84]
-
S. Lee (Seoul National University), S. Yoo (TIMA Lab), K. Choi (Seoul
National University)
This paper presents a method of intra-task dynamic voltage scaling
(DVS) for SoC design with hierarchical FSM and synchronous
dataflow model (in short, HFSM-SDF model). To have an optimal
intra-task DVS, exact execution paths need to be determined in
compile time or runtime. In general programs, since determining
exact execution paths in compile time or runtime is not possible,
existing methods assume worst/average-case execution paths and
take static voltage scaling approaches. In our work, we exploit a
property of HFSM-SDF model to calculate exact execution paths in
runtime. With the information of exact execution paths, our DVS
method can calculate exact remaining workload. The exact workload
enables to calculate optimal voltage level which gives optimal
energy consumption while satisfying the given timing constraint.
Experiments show the effectiveness of the presented method in low power
design of an MPEG4 decoder system.
Categories & Subject Descriptors: J.6 [Computer-Aided Engineering]:
Computer Aided Design (CAD)
General Terms: Design, Performance
Keywords: Low power, dynamic voltage scaling, variable
supply voltage, formal model, finite state machine, synchronous
dataflow
-
P2.6 Reducing Access Energy of On-Chip Data Memory Considering
Active Data Bitwidth [p. 88]
-
T. Okuma, Y. Cao, M. Muroyama, H. Yasuura (Kyushu University)
This paper presents a new concept called active data bitwidth,
which is the effective data length of data bus. By means of
profiling the active data bitwidth dynamically, we present a
novel low-energy memory access technique for on-chip data
memory design. By reducing the redundant access energy
of data memory, our experimental results of two real applications,
show that we can achieve significant energy reduction.
Compared to the monolithic memory, for JPEG,
52.2%; for MPEG-2 84.2%, the energy reduction is reported.
Compared to the memory banking technique, 12.3% energy
reduction for JPEG and 65.9% for MPEG-2 is reported.
Categories and Subject Descriptors
C.3 [Computer Systems Organization]: Special-Purpose
and Application-Based Systems
General Terms
Design
Session Chair: Ram Krishnamurthy (Intel)
Session Organizer: Kaushik Roy (Purdue University)
-
3.1 Energy Recovering Static Memory [p. 92]
-
J. Kim, C.H. Ziesler, M.C. Papaefthymiou (University of Michigan)
This paper proposes an energy-recovering (a.k.a. adiabatic) static
RAM with a novel driver that reduces power dissipation by efficiently
recovering energy from the bit/word line capacitors. Powered
by a single-phase sinusoidal power-clock, our SRAM delivers
read and write operations with single-cycle latency. To that end,
a precharge-low scheme is employed along with a modified sense
amplifier design that achieves high efficiency at differential voltages
near VSS. A simple control circuit is used to maintain driver
operation in synchrony with the power-clock waveform. Feedback
circuitry from the driver output to the control circuit ensures that
our driver remains efficient, independent of the access pattern.
Our energy recovering SRAM functions correctly while achieving
substantial energy savings over a wide range of supply voltages
and operating frequencies. Hspice simulations of a simple full custom
adiabatic 256x256 SRAM, that includes the energy recovering
bit/word line drivers, the cell array, and the sense amplifiers,
show over 2.6x energy savings at 3V, 300MHz in comparison with
its conventional counterpart.
Categories and Subject Descriptors
B.3.1 [Memory Structures]: Semiconductor MemoriesÖ Static
memory (SRAM)
General Terms
Design, Performance
Keywords
Adiabatic circuitry, charge recovery, cache memories, on-chip memories,
low-energy design, low-power computing.
-
3.2 Low Power Integrated Scan-Retention Mechanism [p. 98]
-
V. Zyuban, S.V. Kosonocky (IBM T.J. Watson Research Center)
This paper presents a methodology for unifying the scan mechanism
and data retention in latches which leads to scannable latches
with the data retention capability achieved at a very low power
overhead during the active mode. A detailed analysis of power
and area overhead is presented, with layout examples for various
common latch styles. Implications of using different power gating
techniques for reducing leakage during sleep mode on the design of
retention latches are considered, including well biasing for leakage
control and sharing wells between gated logic and retention latch
devices.
Categories and Subject Descriptors
B.2.1 [Design Styles]: Pipeline; B.6.1 [Design Styles]: Sequential
circuits; B.7.1 [Types and Design Styles]: VLSI
General Terms
Design
Keywords
data retention, MTCMOS, subthreshold, leakage, low power, latch,
scan, balloon latch
-
3.3 Closed Loop Adaptive Voltage Scaling Controller for Standard-Cell
ASICs [p. 103]
-
S. Dhar, D. Maksimovic (University of Colorado), B. Kranzen (National
Semiconductor)
The paper describes a closed-loop controller for adaptive voltage
scaling (AVS) where the supply voltage to a standard-cell ASIC is
dynamically adjusted to the minimum value required for the desired
system speed. The controller includes a clock generator that
provides a low-jitter clock to the ASIC at all steady-state operating
points and through transients. To speed up the voltage transient response
to step changes in clock frequency, the controller is based
on a multiple-tap resettable delay line. A chip including the AVS
controller and a dual 16-bit MAC application has been fabricated
in a standard 0.5 µ CMOS process. The area taken by the AVS
controller is 0.12 mm2. Experimental results demonstrate operation
over the application clock frequency range from 80 kHz to
20 MHz, and a 38 µs transient response for a step change in speed
from standby to maximum throughput operation.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits; B.5 [Hardware]: Register-
Transfer-Level Implementation; B.8 [Hardware]: Performance and
Reliability
General Terms
Design,Performance,Experimentation
Keywords
circuit design, design methodology, delay-line, low-power, energy efficient,
voltage scaling, standard-cell, DC-DC converter
-
3.4 Design of a Branch-Based 64-bit Carry-Select Adder in 0.18um
Partially Depleted SOI CMOS [p. 108]
-
A. N¶ve, D. Flandre (Universite Catholique de Louvain), H. Schettler,
T. Ludwig, G. Hellner (IBM Entwicklung GmbH)
The paper presents the design of a 64-bit carry-select adder
in Branch-Based Logic, a static design style that minimizes the
internal node capacitances. This feature is used to lower the
VZ dynamic power dissipation, while maintaining good speed performances.
The experimental realization of the adder demonstrates an overall delay
of 720 ps while only dissipating 96 mW at 1 GHz. The fabrication is based
on the 0.18 µm IBM CMOS8S2 SOI technology, which uses partially depleted
transistors and copper metallization.
Categories and Subject Descriptors
B.6.1 [Logic Design]: Design Styles Combinational Logic.
General Terms
Performance, Design.
Keywords
Circuit Design, SOI technology, Logic design styles.
Session Chair: N. Ranganathan (University of S. Florida, Tampa, FL)
Session Organizer: Mahmut Kandemir (Penn State University)
-
4.1 Low-Power Color TFT LCD Display for Hand-Held Embedded Systems [p. 112]
-
I. Choi, H. Shim, N. Chang (Seoul National University)
An LCD (Liquid Crystal Display) is a standard display device for
hand-held embedded systems. Today, color TFT (Thin-Film Transistor)
LCDs are common even in cost-effective equipments. An
LCD display system is composed of an LCD panel, a frame buffer
memory, an LCD and frame buffer controller, and a backlight inverter
and lamp. All of them are heavy power consumers, and their
portion becomes much more dominant when running interactive
applications. This is because interactive applications are often triggered
by human inputs and thus result in a lot of slack time in the
CPU and memory system, which can be effectively used for dynamic
power management.
In this paper, we introduce low-power LCD display schemes as a
system-level approach. We accurately characterize the energy consumption
at the component level and minimize energy consumption
of each component without appreciable display quality degradation.
We develop several techniques such as variable-duty-ratio
refresh, dynamic-color-depth control and backlight luminance dimming
with brightness compensation or contrast enhancement. Each
method exhibits power reduction of 260mW, 250mW and 480mW,
respectively. The aggregate energy reduction ratio is 28% out of
total energy consumption including the CPU and the main memory
system when we execute a document viewer. We also demonstrate
that we can extend the battery life about 38% and 20% for a text
editor and an MPEG4 player, respectively.
Categories and Subject Descriptors
C.5 [Computer Systems Organization]: Computer System Implementation;
I.3.1 [Computer Graphics]: Hardware Architecture;
B.4.2 [Input/Output And Data Communications]: Input/Output
DevicesÖImage Display
General Terms
Design
Keywords
low power, low energy, LCD, embedded system
-
4.2 Discharge Current Steering for Battery Lifetime Optimization [p. 118]
-
L. Benini (Universita di Bologna), A. Macii, E. Macii (Politecnico di Torino),
M. Poncino (Universita di Verona)
Recent work on battery-driven power management has demonstrated
that sequential discharge is suboptimal in multibattery
systems, and lifetime can be maximized by distributing
(steering) the current load on the available batteries,
thereby discharging them in a partially concurrent fashion.
Based on these observations, we formulate multi-battery lifetime
maximization as a continuous, constrained optimization
problem, which can be efficiently solved by non-linear
optimizers. We show that great lifetime extensions can be
obtained with respect to standard sequential discharge, as
well to previously proposed battery allocation schemes.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering;
C.4 [Computer Systems Organization]: Performance
of Systems; G.1 [Numerical Analysis]: Optimization
General Terms
Design, Performance
Keywords
Energy consumption, battery lifetime optimization
-
4.3 Towards Energy-Aware Software-Based Fault Tolerance in Real-Time Systems [p. 124]
-
O.S. Unsal, I. Koren, C.M. Krishna (University of Massachusetts, Amherst)
Many real-time systems employed in defense, space, and consumer
applications have power constraints and high reliability
requirements. In this paper, we focus on the relationship
between fault tolerance techniques and energy consumption.
In particular, we establish the energy efficiency of Application
Level Fault Tolerance (ALFT) over other software-based
fault tolerance methods. We then develop sensible energy-aware
heuristics for ALFT schemes. The heuristics yield up
to 40% energy savings.
Session Chair: Lawrence Clark (Intel)
Session Organizer: Peter Kogge (University of Notre Dame)
-
5.1 Fine-Grain CAM-Tag Cache Resizing Using Miss Tags [p. 130]
-
M. Zhang, K. Asanovic (MIT Laboratory for Computer Science)
A new dynamic cache resizing scheme for low-power CAM-tag
caches is introduced. A control algorithm that is only
activated on cache misses uses a duplicate set of tags, the
miss tags, to minimize active cache size while sustaining
close to the same hit rate as a full size cache. The cache
partitioning mechanism saves both switching and leakage
energy in unused partitions with little impact on cycle time.
Simulation results show that the scheme saves 28{56% of
data cache energy and 34{49% of instruction cache energy
with minimal performance impact.
Categories and Subject Descriptors
B.3.2 [Memory Structures]: Design Styles|Associative
Memory, Cache Memory, Primary Memory
General Terms
Design
Keywords
Content-Addressable-Memory, Low-Power, Cache Resizing,
Energy Efficiency, Leakage Current
-
5.2 An Adaptive Serial-Parallel CAM Architecture for Low-Power
Cache Blocks [p. 136]
-
A. Efthymiou, J.D. Garside (University of Manchester)
There is an on-going debate about which consumes less energy:
a RAM-tagged associative cache with an intelligent order
of accessing its tags and ways (e.g. way prediction), or a
CAM-tagged high associativity cache. If a CAM search can
consume less than twice the energy of reading a tag RAM, it
would probably be the preferred option for low-power applications.
Based on memory traces | which usually cause tag
mismatch within the lower four bits |a new serial CAM organisation
is proposed which consumes just 45% more than
a single tag RAM read and is only 25% slower than the conventional
, parallel CAM. Furthermore, it can optionally be
operated as a parallel CAM, at no speed penalty, and still
reduce energy consumption.
Categories and Subject Descriptors
B.3.2 [Memory Structures]: Design Styles|Associative
memories; B.3.2 [Memory Structures]: Design Styles|
Cache memories; B.7.1 [Types and Design Styles]: VLSI
General Terms
Design, Performance
Keywords
CAM, cache design, VLSI, low power, low energy, asynchronous
circuits
-
5.3 Reducing Energy Consumption of Video Memory by Bit-Width Compression [p. 142]
-
V.G. Moshnyaga, K. Inoue, M. Fukagawa (Fukuoka University),
A new architectural technique to reduce energy dissipation
of video memory is proposed. Unlike existing approaches,
the technique exploits the pixel correlation in video sequences,
dynamically adjusting the memory bit-width to the number
of bits changed per pixel. Instead of treating the data bits
independently, we group the most significant bits together,
activating the corresponding group of bit-lines adaptively to
data variation. The method is not restricted to the specific
bit-patterns nor depends on the storage phase. It works
equally well on read and write accesses, as well as during
precharging. Simulation results show that using this method
we can reduce the total energy consumption of video memory
by 20% without affecting the picture quality.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures; B.5.1 [Hardware]:
DesignÖmemory design
General Terms
Design
Keywords
bitwidth-compression, frame memory, low-power design
-
5.4 A History-Based I-Cache for Low-Energy Multimedia Applications [p. 148]
-
K. Inoue, V.G. Moshnyaga (Fukuoka University), K. Murakami (Kyushu University)
This paper proposes a history-based tag-comparison scheme
for reducing energy consumption of direct-mapped instruction
caches. The proposed cache efficiently exploits programexecution
footprints recorded in the Branch Target Buffer
(BTB), and attempts to detect and eliminate unnecessary
tag checks at run time. Simulation results show that our
approach can eliminate up to 95% of tag checks, saving the
cache energy by 17%, while affecting the processor performance
by only 0.2%.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures; C.1 [Computer Systems
Organization]: Processor Architectures
General Terms
Design
Session Chair: Anand Raghunathan (NEC)
Session Organizer: Joerg Henkel (NEC)
-
6.1 Battery Lifetime Prediction for Energy-Aware Computing [p. 154]
-
D. Rakhmatov, S. Vrudhula (University of Arizona),
D.A. Wallach (Hewlett-Packard Western Research Laboratory)
Predicting the time of full discharge of a finite-capacity energy
source, such as a battery, is important for the design of portable
electronic systems and applications. In this paper we present a
novel analytical model of a battery that not only can be used to
predict battery lifetime, but also can serve as a cost function for optimization
of the energy usage in battery-powered systems. The
model is physically justified, and involves only two parameters,
which are easily estimated. The paper includes the results of extensive
experimental evaluation of the model with respect to numerical
simulations of the electrochemical cell, as well as measurements
taken on a real battery. The model was tested using constant, interrupted,
periodic and non-periodic discharge profiles, which were
derived from standard applications run on a pocket computer.
Categories and Subject Descriptors
C.4.5 [Performance of Systems]: Performance Attributes
General Terms
Performance, Experimentation
Keywords
Battery, modeling, low-power design
-
6.2 Early Evaluation Techniques for Low Power Binding [p. 160]
-
E. Kursun, A. Srivastava, S.O. Memik, M. Sarrafzadeh
(University of California Los Angeles)
This paper presents effective metrics to evaluate the power
dissipation of scheduled data flow graphs (DFGs). This enables
early evaluation of schedules without performing the
computationally expensive resource-binding step. Our metrics
correlate heavily (as high as 0.95 and > 0.75 for most test cases)
with power dissipation values obtained after resource binding
and rescheduling for power optimization steps. An experimental
flow that integrates path-based scheduling, power optimal
binding and power driven iterative rescheduling stages is
constructed. The flow integrates commercial tools like Synopsys,
VSS and academic compilers like SUIF in a common
optimization framework. Experimental results on DFGs from
MediaBench suit also demonstrate the fact that metric evaluation
is on average 42.6 times faster than performing optimal binding
and iterative power improvement. Hence metric based evaluation
enables fast design exploration at early stages.
Categories & Subject Descriptors: [Design] High Level
Synthesis, Power Optimization, Scheduling, Resource Binding.
General Terms: Design
Keywords: Low Power Design, Scheduling, Resource Binding,
Metric Evaluation.
-
6.3 Unified Methodology for Resolving Power-Performance Tradeoffs
at the Microarchitectural and Circuit Levels [p. 166]
-
V. Zyuban, P. Strenski (IBM T.J. Watson Research Center)
Evaluation of architectural tradeoffs is complicated by implications
in the circuit domain which are typically not captured in the analysis
but substantially affect the results. We propose a metric of
hardware intensity (h), which is useful for evaluating issues that
affect both circuits and architecture. Analyzing data for actual designs
we show how to measure the introduced parameters and discuss
variations between observed results and common theoretical
assumptions. For a power-efficient design we derive relations for h
and supply voltage V under progressively more general situations,
and incorporate h into a prior art architectural energy-efficiency
criterion. Then, a more general relation is derived for the optimal
balance between the architectural complexity, hardware intensity
and power supply. Modified forms for these relations are obtained
in special cases where the supply voltage is constrained or when
clock gating is disallowed.
Categories and Subject Descriptors
B.2.4 [High-Speed Arithmetic]: Cost/performance; B.2.1 [Design
Styles]: Pipeline; B.6.1 [Design Styles]: Combinational logic, Parallel
circuits; B.6.3 [Design Aids]: Optimization; B.7.1 [Types
and Design Styles]: Microprocessors and microcomputers,VLSI;
C.5.3 [Microcomputers]: Microprocessors; C.0 [General]: Modeling
of computer architecture
General Terms
Design, Performance
Keywords
Energy, power, energy efficiency, hardware intensity, metric
Session Chair: Ingrid Verbauwhede (UCLA)
-
Is Nanoelectronics the Future of Microelectronics? [p. 172]
-
M. Lundstrom (Purdue University)
We examine current research in nanoelectronics and discuss
the role it may play in future electronic systems.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles
advanced technologies, memory technologies, VLSI
General Terms
Design, Performance, Theory
Keywords
nanoelectronics, Moore's Law, molecular electronics
Session Chair: David Brooks (IBM T.J. Watson)
Session Organizer: Lea Hwang Lee (Motorola)
-
7.1 Saving Energy with Just In Time Instruction Delivery [p. 178]
-
T. Karkhanis, J.E. Smith (University of Wisconsin-Madison), P. Bose
(IBM T.J. Watson Research Center)
Just-In-Time instruction delivery is a general method for
saving energy in a microprocessor by dynamically limiting
the number of in-flight instructions. The goal is to save energy
by 1) fetching valid instructions no sooner than necessary,
avoiding cycles stalled in the pipeline -- especially the
issue queue, and 2) reducing the number of fetches and subsequent
processing of mis-speculated instructions. A simple
algorithm monitors performance and adjusts the maximum
number of in-flight instructions at fairly long intervals, 100K
instructions in this study. The proposed JIT instruction delivery
scheme provides the combined benefits of more targeted
schemes proposed previously. With only a 3% performance
degradation, energy savings in the fetch, decode
pipe, and issue queue are 10%, 12%, and 40%, respectively.
Categories and Subject Descriptors
C.1.3 [Processor Architectures]: Other Architecture Styles
adaptable architectures, pipeline processors.
General Terms
Performance, Design
Keywords
Low-power, adaptive processor, instruction delivery
-
7.2 Tradeoffs in Power-Efficient Issue Queue Design [p. 184]
-
A. Buyuktosunoglu, D. H. Albonesi (University of Rochester),
P. Bose, P.W. Cook, S E. Schuster (IBM T.J. Watson Research Center)
A major consumer of microprocessor power is the issue queue.
Several microprocessors, including the Alpha 21264 and POWER4TM,
use a compacting latch-based issue queue design which has the advantage
of simplicity of design and verification. The disadvantage
of this structure, however, is its high power dissipation.
In this paper, we explore different issue queue power optimization
techniques that vary not only in their performance and power
characteristics, but in how much they deviate from the baseline implementation.
By developing and comparing techniques that build
incrementally on the baseline design, as well as those that achieve
higher power savings through a more significant redesign effort, we
quantify the extra benefit the higher design cost techniques provide
over their more straightforward counterparts.
Categories and Subject Descriptors
C [1]: Processor Architectures, C.1.3 Other Architecture Styles-
Adaptable architectures
General Terms
Performance, Design
Keywords
Low-power, microarchitecture, issue queue, banking, adaptation,
compacting, non-compacting
-
7.3 Reducing Transitions on Memory Buses using Sector-based
Encoding Technique [p. 190]
-
Y. Aghaghiri (University of Southern California), F. Fallah (Fujitsu
Laboratories of America), M. Pedram (University of Southern California)
In this paper, we introduce a class of irredundant low power encoding
techniques for memory address buses. The basic idea is to partition the
memory space into a number of sectors. These sectors can, for
example, represent address spaces for the code, heap, and stack
segments of one or more application programs. Each address is first
dynamically mapped to the appropriate sector and then is encoded with
respect to the sector head. Each sector head is updated based on the last
accessed address in that sector. The result of this sector-based encoding
technique is a reduction in the number of bus transitions when encoding
consecutive addresses that access different sectors. Our proposed
techniques have small power and delay overhead when compared with
many of the existing methods in the literature. One of our proposed
techniques is very suitable for encoding addresses that are sent from an
on-chip cache to the main memory when multiple application programs
are executing on the processor in a time-sharing basis. For a computer
system without an on-chip cache, the proposed techniques decrease the
switching activity of data address and multiplexed address buses by an
average of 55% and 67%, respectively. For a system with on-chip
cache, up to 55% transition reduction is achieved on a multiplexed
address bus between the internal cache and the external memory.
Assuming a 10pF per line bus capacitance, we show that power
reduction of up to 52% for an external data address bus and 42% for the
multiplexed bus between cache and main memory is achieved using our
methods.
Categories and Subject Descriptors: B.4.3. [Input/output
and data communications]: Interconnections, Interfaces.
General Terms: Algorithms and Design.
-
7.4 Energy-Efficient Hybrid Wakeup Logic [p. 196]
-
M. Huang, J. Renau, J. Torrellas (University of Illinois at Urbana-Champaign)
The instruction window is a critical component and a major energy
consumer in out-of-order superscalar processors. An important
source of energy consumption in the instruction window is the
instruction wakeup: a completing instruction broadcasts its result
register tag and an associative comparison is performed with all the
entries in the window.
This paper shows that a very large fraction of the completing
instructions have to wake up no more than a single instruction currently
in the window. Consequently, we propose to save energy
by using indexing to only enable the comparator at the single instruction
to wake up. Only in the rare case when more than one
instruction needs to wake up, our scheme reverts to enabling all
the comparators or a subset of them. For this reason, we call our
scheme Hybrid. Overall, our scheme is very effective: for a processor
with a 96-entry window, the number of comparisons performed
by the average completing instruction with a destination register is
reduced to 0.8. The exact magnitude of the energy savings will
depend on the specific instruction window implementation. Furthermore,
the application suffers no performance penalty.
Categories & Subject Descriptors:
C.0 Computer System Organization: System Architectures.
C.1.1 Single Data Stream Architectures: RISC/CISC,VLIW Architectures
C.5.3 Microcomputers: Microprocessors.
General Terms: Design, Experimentation, Performance
Keywords: Low Power, Wakeup Logic, Issue Logic
Session Chair: Unni Narayanan (Intel)
Session Organizer: G. Stamoulis (Technical University of Crete)
-
8.1 Automated Selective Multi-Threshold Design for Ultra-Low Standby
Applications [p. 202]
-
K. Usami, N. Kawabe, M. Koizumi, K. Seta (Toshiba Corporation Semiconductor
Company), T. Furusawa (Toshiba Microelectronics Corporation)
This paper describes an automated design technique to selectively
use multi-threshold CMOS (MTCMOS) in a cell-by-cell fashion.
MT cells consisting of low-Vth transistors and high-Vth sleep
transistors are assigned to critical paths, while high-Vth cells are
assigned to non-critical paths. Compared to the conventional
MTCMOS, the gate delay is not affected by the discharge patterns
of other gates because there is no virtual ground to be shared. We
applied this technique to a test chip of a DSP core. The worst
path-delay was improved by 14% over the single high-Vth design
without increasing standby leakage at 10% area overhead.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles VLSI,
DSP.
General Terms
Performance, Design, Experimentation.
Keywords
Automated design, Multi-Threshold, standby leakage current.
-
8.2 HA2TSD: Hierarchical Time Slack Distribution for Ultra-Low Power
CMOS VLSI [p. 207]
-
K.-w. Choi, A. Chatterjee (Georgia Institute Technology)
This paper describes an efficient hierarchical design and optimization
approach for ultra-low power CMOS logic circuits. We introduce the
Hierarchical Activity-Aware Time Slack Distribution (HA2TSD)
algorithm, which distributes the surplus time slack into the most
power-hungry modules hierarchically. HA2TSD ensures that the total
slack budget is maximal and the total power is near-minimal. Based
on these time slacks, we have optimized technology parameters
(supply voltage, threshold voltage, and device width) through a gate level
power optimizer and have tested the algorithm on a set of
benchmark example circuits and building blocks of a synthesizable
ARM core. The experimental results show that our strategy delivers
over an order of magnitude savings in total (static and dynamic)
power and reduces the optimization run-time significantly.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids-simulation.
General Terms
Algorithms.
Keywords
Low-power design, time slack distribution, and gate-level power
optimization.
-
8.3 Runtime Mechanisms for Leakage Current Reduction in CMOS
VLSI Circuits [p. 213]
-
A. Abdollahi (University of Southern California),
F. Fallah (Fujitsu Laboratories of America), M. Pedram (University
of Southern California)
This paper describes two runtime mechanisms for
reducing the leakage current of a CMOS circuit. In both cases, it
assumed that the system or environment produces a "sleep" signal
that can be used to indicate that the circuit is in a standby mode.
the first method, the "sleep" signal is used to shift in a new set
external inputs and pre-selected internal signals into the circuit
with the goal of setting the logic values of all of the internal signals
so as to minimize the total leakage current in the circuit. This
minimization is possible because the leakage current of a CMOS
gate is a strong function of the input combination applied to
inputs. In the second method, NMOS and PMOS transistors are
added to some of the gates in the circuit to increase the
controllability of the internal signals of the circuit and decrease the
leakage current of the gates using the "stack effect". This
however, done carefully so that the minimum leakage is achieved
subject to a delay constraint for all input-output paths in the
circuit. In both cases, Boolean satisfiability is used to formulate the
problems, which are subsequently solved by employing a highly
efficient SAT solver. Experimental results on the circuits in the
MCNC91 benchmark suite demonstrate that it is possible to reduce
the leakage current by up to 70% in VLSI circuits at the expense
a very small overhead.
Categories and Subject Descriptors:
B.7.1. [Integrated Circuits]: Types and Design Styles, VLSI
General Terms: Algorithms and Design
Session Chair: Christian Piguet (CSEM & EPFL, Switzerland)
-
Future Directions in Clocking Multi-Ghz Systems [p. 219]
-
V.G. Oklobdzija (University of California), J. Sparso (Technical University
of Denmark)
This tutorial addresses the problems and possible solutions of
clocking digital systems operating at multi-GHz frequencies.
The first part of the tutorial will address techniques for
managing clock uncertainties and clock power in synchronous
circuits. There are two trends that are disturbing: (a) the power
taken by the clock distribution network and clocked storage
elements (flip-flops and latches) is increasing relatively to the
rest of the logic, (b) clock uncertainties are taking a significant
portion of the cycle away from useful logic operations. There
are no radical solutions in sight. We present the ways of
designing clock storage elements that are capable of absorbing
significant portion of clock uncertainties and passing delay from
one logic stage to the other. At multi-GHz frequencies of
operation it will be difficult to precisely control the timing
boundaries between the logic stages. Thus the ability to extend
the operation into the time period allocated for the next pipeline
stage is important. This is known as time borrowing. Also, the
ability to incorporate logic into the clocked storage elements is
of critical importance given that the number of logic stages in a
pipeline running at multi-GHz frequencies, is decreasing to less
than ten.
Session Chair: Mary Jane Irwin (Penn State University)
-
Compilers for Power and Energy Management [p. 220]
-
U. Kremer (Rutgers University)
Optimizing compilers perform program analyses and transformations
at different levels of program abstraction, ranging
from source code, intermediate code such as three address
code, to assembly and machine code. Analyses and
transformations can have different scopes. They can be
performed within a single basic block (local), across basic
blocks but within a procedure (global), or across procedure
boundaries (interprocedural). Traditionally, optimizing
compilers try to reduce overall program execution time
or resource usage such as memory. The compilation process
itself can be done before program execution (static compilation),
or during program execution (dynamic compilation).
This large design space is the main challenge for compiler
writers. Many tradeoffs have to be considered in order to
justify the development and implementation of a particular
optimization pass or strategy. However, every compiler
optimization needs to address the following three issues:
1. opportunity: When can the optimization be applied?
2. safety: Does the optimization preserve program semantics?
3. profitability: When applied, how much performance
improvement can be expected?
Session Chair: Paul Hurst (UC Davis)
Session Organizer: Satyen Mukherjee (Philips)
-
9.1 Oversampled Gain-Boosting [p. 221]
-
O. Oliaei (Motorola Labs)
A dynamic gain-enhancement technique suitable for low voltage
low-power oversampling circuits, particularly
sigma-delta converters, is presented. This method makes use
of a discrete-time integrator to improve gradually the output
resistance of the main amplifier over successive clocks.
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated Circuits
General Terms
Circuit Design
Keywords
Switched-Capacitor, MOS amplifier, bootstrapping, ADC,
DAC, sigma-delta, gain boosting, gain enhancement, OTA.
-
9.2 ±0.5V ~ ±1.5V UHF CMOS LV/LP Four-Quadrant Analog Multiplier in
Modified Bridged-Triode Scheme [p. 227]
-
S.C. Li, J.C. Cha (National Yunlin Univ. of Science and Technology)
A new LV/LP CMOS four-quadrant analog multiplier designed in a
modified bridged-triode scheme (MBTS) is presented. It brings
in the benefits in terms of linearity, power consumption,
frequency response and total harmonic distortion (THD). The
fabricated chip in TSMC 0.35µm n-well SPQM CMOS technology
has a nonlinearity error less than 0.8% over ±0.5V input
range under a nominal supply voltage of ±1.5V, and
consumes the total power dissipation of 2.7 mW only.
Categories & Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles Algorithms implemented in hardware, Input/output circuits.
General Terms
Design, Performance, Measurement.
Keywords
Analog multiplier, Modified Bridged-Triode Scheme (MBTS).
-
9.3 A Power and Resolution Adaptive Flash Analog-to-Digital Converter [p. 233]
-
J. Yoo, D. Lee, K. Choi, J. Kim (Pennsylvania State University)
A new power and resolution adaptive flash ADC, named
PRA-ADC, is proposed. The PRA-ADC enables exponential
power reduction with linear resolution reduction. Unused
parallel voltage comparators are switched to standby
mode. The voltage comparators consume only the leakage
power during the standby mode. The PRA-ADC, capable of
operating at 5-bit, 6-bit, 7-bit, and 8-bit precision, dissipates
69 mW at 5-bit and 435 mW at 8-bit. The PRA-ADC was
designed and simulated with 0.18 um CMOS technology.
The PRA-ADC design is applicable to RF portable communication
devices, allowing tighter management of power and
efficiency.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles|
VLSI
General Terms
Design
Keywords
Analog-to-Digital Converter, Flash ADC, Threshold Inverter
Quantization, TIQ Comparator, Adaptive
-
9.4 Design Techniques for Low Power High Bandwith Upconversion in CMOS [p. 237]
-
C. De Ranter, M. Steyaert (Katholieke Universiteit Leuven)
An upconvertor topology for low power, high bandwidth
applications is presented. Using specific circuit techniques
and local circuit-level optimization, the power consumption
of the total system comprising an on-chip LC-type VCO,
a polyphase network quadrature generator, a linear mixer
block and an RF-current buffer, has been minimized.
A chip has been designed and manufactured in a 0.25µm
CMOS technology. The VCO oscillates between 1.68 GHz
and 2 GHz. Driven by an external LO, the transmitter operates
from 900 MHz up to 2 GHz. At 2 GHz, the upconvertor
transmits -12 dBm into 50 . with a linearity of more than
-35 dBc for base band signals up to 33 MHz.
Categories and Subject Descriptors
B.7.m [Integrated Circuits]: MiscellaneousÖAnalog RF
CMOS Design
General Terms
Design
Keywords
Low power, Analog, Upconversion, Oscillators, RF Design,
CMOS
Session Chair: Vivek Tiwari (Intel)
Session Chair: Vojin G. Oklobdzija (UC Davis)
-
P3.1 TLB and Snoop Energy-Reduction using Virtual Caches in Low-
Power Chip Multiprocessors [p. 243]
-
M. Ekman (Chalmers University of Technology), F. Dahlgren (Ericsson Mobile
Platforms), P. Stenstrūm (Chalmers University of Technology)
In our quest to bring down the power consumption in low-power
chip-multiprocessors, we have found that TLB and snoop accesses
account for about 40% of the energy wasted by all L1 data-cache
accesses. We have investigated the prospects of using virtual
caches to bring down the number of TLB accesses. A key observation
is that while the energy wasted in the TLBs are cut, the energy
associated with snoop accesses becomes higher. We then contribute
with two techniques to reduce the number of snoop accesses
and their energy cost. Virtual caches together with the proposed
techniques are shown to reduce the energy wasted in the L1 caches
and the TLBs by about 30%.
Categories and Subject Descriptors
C.5.3 Microcomputers---Microprocessors
General Terms
Performance, Design
Keywords
low-power, CMP, snoop, virtual caches
-
P3.2 A Preactivating Mechanism for a VT-CMOS Cache using Address
Prediction [p. 247]
-
R. Fujioka, K. Katayama, R. Kobayashi, H. Ando,
T. Shimada (Nagoya University)
It has become an important requirement to achieve high
performance and low-power consumption at the same time.
The dynamic leakage cut-off (DLC) scheme, which controls
transistors threshold voltage by the line on demand, is a
technique that potentially satisfies that requirement for a
cache. Yet, conventional DLC causes access time to significantly
lengthen, and consequently processor performance is
unacceptably degraded. This paper proposes a mechanism
that suppresses the performance degradation by preactivating
cache lines using address prediction before access requests.
Our evaluation results show significant performance
improvements are achieved with little increase of power consumption.
Keywords
leakage current, L1 data cache, address prediction
-
P3.3 Dynamic Vt SRAM: A Leakage Tolerant Cache Memory for Low
Voltage Microprocessors [p. 251]
-
C.H. Kim, K. Roy (Purdue University)
This paper presents a Dynamic Vt SRAM (DTSRAM) architecture
to reduce the subthreshold leakage in cache memories. The Vt of
each cache line is controlled separately by means of body biasing.
In order to minimize the energy and delay overhead, a cache line is
switched to high Vt only when it is not likely to be accessed anymore.
Simulation results from SimpleScalar framework show that
even after considering the energy overhead, the DTSRAMcan save
72% of the cache leakage with a performance loss less than 1%.
Layout of the DTSRAM shows that the area penalty is minimal.
-
P3.4 Asymmetric-Frequency Clustering: A Power-Aware Back-End for
High-Performance Processors [p. 255]
-
A. Baniasadi (Northwestern University), A. Moshovos (University of Toronto)
We introduce asymmetric frequency clustering (AFC), a
micro-architectural technique that reduces the dynamic power dissipated
by a processor's back-end while maintaining high performance.
We present a dual-cluster, dual-frequency machine
comprising a performance oriented cluster and a power-aware one.
The power-aware cluster operates at half the frequency of the performance
oriented cluster and uses a lower voltage supply. We
show that this organization significantly reduces back-end power
dissipation by executing non-performance-critical instructions in
the power-aware cluster. AFC localizes the two frequency/voltage
domains. Consequently, it mitigates many of the complexities
associated with maintaining multiple supply voltage and frequency
domains on the same chip. Key to the success of this technique are
methods that assign as many instructions as possible to the slower/
lower power cluster without impacting overall performance. We
evaluate our techniques using a subset of SPEC2000 and SPEC95.
AFC provides a 16% back-end power reduction with 1.5% performance
loss compared to a conventional, dual-clustered processor
where each cluster has schedulers of the same width and length.
Categories and Subject Descriptors
C.1.1 [Single Data Stream Architectures] Pipeline processors.
General Terms
Design
Keywords
Power-Aware Architectures, Processor Back-End, Instruction Criticality,
Assymetric Frequency Clustering, High-Performance Processors.
Session Chair: Ed Cheng (Synopsys)
-
P4.1 Power Analysis Techniques for SoC with Improved Wiring Models [p. 259]
-
T. Sakamoto, T. Yamada, M. Mukuno, Y. Matsushita, Y. Harada (Sanyo Electric),
H. Yasuura (Kyushu University)
This paper proposes two techniques for improving the accuracy of
gate-level power analysis for system-on-a-chip (SoC).
(1) The creation of custom wire load models for clock nets
(2) The use of layout information (actual net capacitance and
input signal transition time)
The analysis time is reduced to less than one three-hundredth of
the transistor-level power analysis time. The error is within 5% of
that of a real chip, (the same level in transistor-level power analysis)
if technique (2) is used. The analytical error between technique
(1) and (2) is within 1%.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids Simulation, Verification,
Placement and routing, Layout.
General Terms
Verification, Experimentation, Design
Keywords
SoC, power analysis, gate-level, custom wire load model
-
P4.2 A Microarchitectural-Level Step-Power Analysis Tool [p. 263]
-
W. El-Essawy, D.H. Albonesi (University of Rochester), B. Sinharoy (IBM
Corporation)
Clock gating is an effective means for reducing average power consumption.
However, clock gating can exacerbate maximum cycle-to-cycle current
swings, or the step-power (Ldi/dt) problem. We
present a microarchitecture-level step-power simulator and demonstrate
its use in exploring how design alternatives impact relative
step-power levels. We show how the tool can be used to identify
major sources of high microprocessor step-power events. Our experiments
indicate that branch mispredictions are a major cause of
high step-power occurrences. We also show that high step-power
events are infrequent which suggest that architectural techniques
may limit step-power at potentially low performance cost.
Categories and Subject Descriptors
C.5.3 [Computer Systems Organization]: COMPUTER SYSTEM
IMPLEMENTATION Microcomputers; I.6.5 [Computing Methodology]:
SIMULATION AND MODELING Model Development
General Terms
Reliability Design
Keywords
step-power, Ldi/dt, inductive noise, microprocessors, clock-gating,
architectural simulation
-
P4.3 Power Estimation of Sequential Circuits using Hierarchical Colored
Hardware Petri Net Modeling [p. 267]
-
A K. Murugavel, N. Ranganathan (University of South Florida)
A Hierarchical Colored Hardware Petri net (HCHPN) based model
was proposed in [8] for estimating switching activity in combinational
circuits. In this paper, we model sequential circuits as
HCHPNs incorporating real delays for both gates and interconnects.
Thus, the given sequential circuit is first modeled as a HCHPN
and simulated for switching activity estimation in the petri net domain
which leads to better accuracy and faster simulation. Experimental
results for ISCAS'89 benchmark circuits show that the proposed
HCHPN model yields accuracy on an average within 4.4%
of that of PowerMill. The per-pattern simulation time for HCHPNs
is about 2.4 times lesser than that of PowerMill.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits; B.7.2 [Integrated Circuits]:
Design AidsÖsimulation
-
P4.4 High-Level Area Estimation [p. 271]
-
K.M. Büyüksahin (University of Illinois at Urbana-Champaign),
F. N. Najm (University of Toronto)
Early power estimation requires one to estimate the area
(gate count) of a design from a high-level description. We
propose a method to do this that makes use of the concept
of Boolean networks (BN) and introduces an invariant
area complexity measure which captures the gate-count requirement
of a design. The method can be adapted to be
used at different points on the area/delay tradeoff curve,
with different synthesizer/mapper tools, and different target
gate libraries. The area model is experimentally verified
and tested using a number of ISCAS and MCNC benchmark
circuits and two different target cell libraries, on two
different synthesis systems.
Categories and Subject Descriptors
B.5.2 [RTL Implementation]: Design aids
General Terms
Design
Keywords
Area estimation, Boolean networks
-
P4.5 Retiming-Based Logic Synthesis for Low-Power [p. 275]
-
Y.-L. Hsu, S.-J. Wang (National Chung-Hsing University)
Power management has become a great concern in VLSI design in
recent years. In this paper, we consider the logic level design
technique for low power applications. We present a retiming based
optimization method, in which part of the circuit is selected
and moved so that it produces logic signals one clock cycle before
they are actually applied. If these values can solely determine the
output logic level, then the other part of the circuit can be turned off
to save power. We explore acceptable retimed circuit
structures, in which circuit function is not changed. An algorithm
is proposed to select the optimal logic block to be retimed. We
experiment the low-power circuit structure with some MCNC
benchmark circuits, and results indicate an improvement over
previous methods. Our method achieves a significant reduction in
switching activity, and the reduction can be more than 70% in
some case. The required area overhead is very small.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids automatic synthesis,
optimization, switching theory.
General Terms
Algorithms, Design.
Keywords
Low-power, retiming, logic design, switching activity.
-
P4.6 Activity-Sensitive Clock Tree Construction for Low Power [p. 279]
-
C. Chen, C. Kang (University of Windsor), M. Sarrafzadeh (University of
California at Los Angeles)
This paper presents an activity-sensitive clock tree construction
technique for low power design of VLSI clock networks. We
introduce the term of node difference based on module activity
information, and show its relationship with the power
consumption. A binary clock tree is built using the node
difference between different modules to optimize the power
consumption due to the interconnections (i.e., clock gating signals
and clock edges). We also develop a method to determine gating
signals with minimum number of transitions. After the clock tree
is constructed, the gating signals are optimized for further power
savings.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles VLSI.
General Terms
Algorithms.
Keywords
Clock tree, low power, clock gating, activity pattern.
Session Chair: Vivek De (Intel)
Session Chair: Wanda Gass (Texas Instruments)
Session Organizer: Sanjive Agarwala (Texas Instruments)
-
11.1 Low-Power VLSI Decoder Architectures for LDPC Codes [p. 284]
-
M.M. Mansour, N.R. Shanbhag (University of Illinois at Urbana-Champaign)
Iterative decoding of low-density parity check codes (LDPC)
using the message-passing algorithm have proved to be extraordinarily
effective compared to conventional maximum-likelihood decoding.
However, the lack of any structural regularity in these essentially random codes is a major challenge
for building a practical low-power LDPC decoder. In this
paper, we jointly design the code and the decoder to induce
the structural regularity needed for a reduced-complexity
parallel decoder architecture. This interconnect-driven code
design approach eliminates the need for a complex interconnection
network while still retaining the algorithmic performance
promised by random codes. Moreover, we propose a
new approach for computing reliability metrics based on the
BCJR algorithm that reduces the message switching activity
in the decoder compared to existing approaches. Simulations
show that the proposed approach results in power
savings of up to 85.64% over conventional implementations.
Categories and Subject Descriptors
B.7.1 [Types and Design Styles]: VLSI; E.4 [Coding
and Information Theory]: Error control codes
General Terms
Design
Keywords
LDPC codes, lower power architectures, BCJR algorithm.
-
11.2 A Low Power Normalized-LMS Decision Feedback Equalizer for a
Wireless Packet Modem [p. 290]
-
D. Garrett, C. Nicol (Lucent Technologies), A. Blanksby, C. Howland (Agere Systems)
This paper presents a decision feedback equalizer (DFE) for a
high-speed packet modem utilizing the normalized least mean
squared (NLMS) tap update algorithm. The equalizer supports up
to 43.2 Mbps uncoded data over a wireless channel with a 10%
training preamble (48 Mbps with no training). In this work the
rapid convergence of the NLMS algorithm is combined a
technique for early termination of the tap training process to yield
a low power DFE implementation. The low power techniques
result in a 43% power reduction over a baseline design.
Furthermore, low power synthesis techniques result in an
additional 30% power savings on top of the algorithmic power
savings.
Categories and Subject Descriptors
B.2.4 [Arithmetic and Logic Structures]: High-speed arithmetic
- algorithms, cost/performance.
General Terms
Algorithms, Performance, Design.
Keywords
Low power, NLMS, equalization, early termination.
-
11.3 High Performance and Low Power FIR Filter Design based on
Sharing Multiplication [p. 295]
-
J. Park, W. Jeong, H. Choo, H. Mahmoodi-Meim, Y. Wang, K. Roy
(Purdue University)
We present a high performance and low power FIR filter design,
which is based on computation sharing multiplier (CSHM). CSHM
specifically targets computation re-use in vector-scalar products
and is effectively used in our FIR filter design. Efficient
circuit level techniques: a new carry select adder and conditional
capture flip-flop (CCFF), are also used to further improve power
and performance. The proposed FIR filter architecture was
implemented in 0.25 m technology. Experimental results on a
10 tap low pass CSHM FIR filter show speed and power improvement
of 19% and 17%, respectively, with respect to an FIR filter
based on Wallace tree multiplier.
Keywords
Computation sharing, FIR filter design, high performance
and low power carry select adder, conditional capture flip-flop
-
11.4 A Low-Power Digital Matched Filter for Spread-Spectrum Systems [p. 301]
-
S. Goto, T. Yamada, N. Takayama, Y. Matsushita, Y. Harada (Sanyo Electric,
Co., Ltd.), H. Yasuura (Kyushu University)
A Digital Matched Filter (DMF) is an essential device for Direct-Sequence
Spread-Spectrum (DS-SS) communication systems.
Reducing the power consumption of a DMF is especially critical
for battery-powered terminals. The reception registers and the
correlation-calculating unit dissipate the majority of the power in a
DMF. In this paper we discuss this problem and propose a low power
architectural approach to a DMF. The total switching
activity factor and the switched capacitance are reduced. As a
result of power analysis at the gate level, the implementation of
the proposed architecture in a standard 0.18-µm CMOS
technology achieved a reduction in the power consumption of
more than 70 %.
Categories and Subject Descriptors
B.5.1 [Register-Transfer-Level Implementation]: Design -
arithmetic and logic units, control design, styles.
General Terms
Algorithms, Management, Design, Experimentation.
Keywords
matched filter, spread-spectrum, CDMA, VLSI, low power.
Session Chair: Pai Chou (UCI)
Session Organizer: Wolfgang Nebel (OFFIS, Oldenburg University)
-
12.1 Parametric Timing and Power Macromodels for High Level
Simulation of Low Swing Interconnects [p. 307]
-
D. Bertozzi, L. Benini, B. Ricco (University of Bologna)
The impact of global on-chip interconnections on power
consumption and speed of integrated circuits is becoming
a serious concern. Designers need therefore to quickly
estimate how performance and power are affected by a given
choice of the interconnection parameters (length, voltage
swing, driver and receiver schematics and sizing). This
work focuses on the entire communication channel (driver,
interconnect, receiver), and provides high level parametric
VHDL simulation models for low-swing signaling schemes.
These SPICE-derived power and timing macromodels
transfer electrical-level information to the RTL simulation in
an event-driven fashion, as transitions occur at the input of
the interconnect driver. The accuracy reached by this back annotation
technique is within 5% with respect to SPICE
results, with only 4% simulation speed penalty in the worst
case.
-
12.2 Compact Models for Estimating Microprocessor Frequency and Power [p. 313]
-
W. Athas, L. Youngs (Apple Computer), A. Reinhart (Motorola Labs)
This paper describes compact mathematical models for estimating
the frequency performance and power dissipation of a microprocessor
as a function of the supply voltage. The objective is to
estimate the frequency and/or power performance across a wide
range of supply voltages and operating frequencies using only a
small number of configurable parameters and equations. These
compact equations are amenable to hand calculations and spreadsheet
manipulation. The configurable parameters are derived from
actual measurements of microprocessor chips and are calculated
using the least-squares curve-fitting method.
Categories and Subject Descriptors
C.4 [Performance of Systems], B.7 [Integrated Circuits], I.6
[Simulation and Modeling], G.4 [Mathematical Software], J.6
[Computer-Aided Engineering]
General Terms
Algorithms, Design, Experimentation, Performance
Keywords
Low-power, microprocessors, VLSI, ASIC, curve-fitting, delay
modeling, power estimation
-
12.3 Efficient Estimation of Signal Transition Activity in MAC
Architectures [p. 319]
-
A. Garcia, L.D. Kabulepa, M. Glesner (Darmstadt University of Technology)
Because of the increasing demand of portable digital systems, it is
of great interest to extend the existing high-level power estimation
techniques to handle architectures with non linear components, as
they appear in relevant practical applications. In this paper we focus
on the estimation of the transition activity in MAC structures
implementing FIR filters. Based on a divide and conquer approach,
an accurate yet efficient estimation procedure is developed. The
technique has been evaluated for different synthetic and real data
sets. In all cases, our results depict only very slight discrepancies
with respect to precise bit level simulations.
Categories and Subject Descriptors
B.8.2 [Hardware]: Performance Analysis & Design Aids
General Terms
Design Performance
Keywords
Low power, power estimation, transition activity, MAC
-
12.4 Novel Modeling Techniques for RTL Power Estimation [p. 323]
-
M. Eiermann, W. Stechele (Technical University of Munich)
In this work, we propose efficient macromodeling techniques for
RTL power estimation, based only on word and bit level switching
information of the module inputs. We present practicable combinations
of these two properties for the construction of power macromodels.
It is demonstrated, that our developed models reduce the
estimation error compared to the Hamming-distance model at least
by 64%. The total average errors (compared to PowerMill)
achieved over a wide range of test modules and input stimuli are
less than 4.6%. This is comparable to complex models, which however,
have to make use of several more signal properties.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model Development - modeling
methodologies.
General Terms
Design, Experimentation, Verification.
Keywords
Power estimation, power modeling, RTL macromodels, low power.
|