|
ISLPED 2001 Abstracts
Sessions:
[Keynote Speech]
[1]
[2]
[Poster Session 1]
[Poster Session 2]
[Invited Talk 1]
[3]
[4]
[5]
[6]
[Invited Talk 2]
[7]
[8]
[Poster Session 3]
[Poster Session 4]
[9]
[10]
[11]
[12]
Session Chair: Vivek De (Intel)
-
Wireless Beyond the Third Generation: Facing the Energy Challenge [p. 1]
-
Jan Rabaey (University of California, Berkeley)
After a stellar growth over the last decade driven by voice as the
killer app, wireless communications is now rapidly moving into a
new era propelled by data networking. For a wide host of devices,
wireless will serve as the "last interconnection hop" to the high
datarate
wired networks. The basic trends in these devices can be best
summarized under the following two headers: "ubiquity" and "more
bits/sec". Both of these have some important ramifications on
energy dissipation. In this paper and accompanying presentation, we
will outline the predominant trends in wireless, analyze the energy
challenge of those, and examine a number of emerging solutions.
GENERAL TERMS - Design
KEYWORDS - Wireless, communications, energy.
Session Chair: Pradip Bose (IBM)
Session Organizer: Steve Kosonocky (IBM)
-
Micro-Operation Cache: A Power Aware Frontend for Variable
Instruction Length ISA [p. 4]
-
Baruch Solomon, Avi Mendelson, Doron Orenstien, Yoav Almog, Ronny Ronen (Intel Corporation)
We introduce the Micro-Operation Cache (Uop Cache - UC)
designed to reduce processor's frontend power and energy
consumption without performance degradation. The UC caches
basic blocks of instructions - pre-decoded into micro-operations
(uops). The UC fetches a single basic-block worth of uops per
cycle. Fetching complete pre-decoded basic-blocks eliminates the
need to repeatedly decode variable length instructions and
simplifies the process of predicting, fetching, rotating and
aligning fetched instructions. The UC design enables even a small
structure to be quite effective.
Results: a moderate-sized UC eliminates about 75% instruction
decodes across a broad range of benchmarks and over 90% in
multimedia applications and high-power tests. For existing Intel
P6 family processors, the eliminated work may save about 10% of
the full-chip power consumption with no performance
degradation.
General Terms: Performance, Design
Keywords: instruction fetch, instruction cache, microoperation
cache, power reduction.
-
L1 Data Cache Decomposition for Energy Efficiency [p. 10]
-
Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas (University of Illinois at Urbana-Champaign)
The L1 data cache is a time-critical module and, at the same time,
a major consumer of energy. To reduce its energy-delay product,
we apply two principles of low-power design: specialize part of the
cache structure and break the cache down into smaller caches. To
this end, we propose a new L1 data cache structure that combines
a Specialized Stack Cache (SSC) and a Pseudo Set-Associative
Cache (PSAC). Individually, our SSC and PSAC designs have a
lower energy-delay product than previously-proposed related designs.
In addition, their combined operation is very effective. Relative
to a conventional 2-way 32 KB data cache, a design containing
a 4-way 32 KB PSAC and a 512 B SSC reduces the energy-delay
product of several applications by an average of 44%.
-
Instruction Flow-Based Front-end Throttling for Power-Aware
High-Performance Processors [p. 16]
-
Amirali Baniasadi (Northwestern University), Andreas Moshovos (University of Toronto)
We present a number of power-aware instruction front-end
(fetch/decode) throttling methods for high-performance
dynamically-scheduled superscalar processors. Our methods reduce
power dissipation by selectively turning on and off instruction
fetch and decode. Moreover, they have a negligible impact on
performance as they deliver instructions just in time for exploiting
the available parallelism. Previously proposed front-end
throttling methods rely on branch prediction confidence estimation.
We introduce a new class of methods that exploit information
about instruction flow (rate of instructions passing through
stages). We show that our methods can boost power savings over
previously proposed methods. In particular, for an 8-way processor
a combined method reduces traffic by 14%, 20%, 6% and
6% for the fetch, decode, issue and complete stages respectively
while performance remains mostly unaffected. The best previously
proposed method reduces traffic by 10%, 15%, 4% and
4% respectively.
-
Energy Reduction in Queues and Stacks by Adaptive Bitwidth Compression [p. 22]
-
Vasily G. Moshnyaga (Fukuoka University)
A new micro-architectural technique to reduce energy dissipated
by queues and stacks is proposed. Similarly to related
research which targets the transition activity in bit-lines, the
technique is based on bitwidth compression. However unlike
them, it utilizes the fixed accessing order embodied in queues
and stacks to exploit input data correlation. The technique
dynamically adjusts the required bitwidth to the number of
bits which changed in comparison to the last access. It is
neither restricted to specific bit-patterns such as zero-byte
or precharging value and works efficiently on read and write
without large area, timing or power overhead. Simulations
show that using this technique, we can save the energy of
instruction queue by up to 30% and the energy of video data
queue by 20%.
Session Chair: TBD
Session Organizer: Luca Benini (University di Bologna)
-
Energy Priority Scheduling for Variable Voltage Processors [p. 28]
-
Johan Pouwelse, Koen Langendoen, Henk Sips (Delft University of Technology)
Clock (and voltage) scheduling is an important technique to
reduce energy consumption of variable-voltage processors.
It is difficult, however, to achieve good results at the OS
and hardware level when applications show bursty behavior.
We take the approach that such applications must be
made power aware and specify their future demands to a
central scheduler controlling the clock speed and processor
voltage. This paper describes our energy priority scheduling
(EPS) heuristic that orders tasks according to how tight
their deadlines are and how often tasks overlap. We schedule
low-priority tasks first, since they can be easily preempted to
accommodate for high-priority tasks later. The EPS heuristic
does not always yield the optimal schedule, but has low
complexity and can be used as an incremental on-line algorithm.
We implemented EPS on a StrongARM-based
variable-voltage platform. Measurements show that EPS
reduces energy consumption with 50% for a bursty video
decoding application without missing any frame deadlines.
-
Dynamic Voltage Scheduling Technique for Low-Power Multimedia
Applications Using Buffers [p. 34]
-
Chaeseok Im, Huiseok Kim, Soonhoi Ha (Seoul National University)
As multimedia applications are used increasingly in many
embedded systems, power efficient design for the applications
becomes more important than ever. This paper proposes a simple
dynamic voltage scheduling technique, which suits the multimedia
applications well. The proposed technique fully utilizes the idle
intervals with buffers in a variable speed processor. The main
theme of this paper is to determine the minimum buffer size to
achieve the maximum energy saving in three cases: single-task,
multiple subtasks, and multi-task. Experimental results show that
the proposed technique is expected to obtain significant power
reduction for several real-world multimedia applications.
-
Power-Aware Modulo Scheduling for High-Performance VLIW Processors [p. 40]
-
Han-Saem Yun, Jihong Kim (Seoul National University)
For high-performance processors, the step power and peak power,
which are closely related to the chip reliability, are important design
constraints, often more than the average power. In VLIW processors
where a single instruction may contain a variable number of
operations, the step power and peak power vary significantly depending
on the parallel schedule generated by a parallelizing compiler.
In this paper, we propose a power-aware modulo scheduling
algorithm for high-performance VLIW processors. The proposed
algorithm reduces both the step power and peak power by producing
a more balanced parallel schedule while not compromising performance.
Experimental results show that the proposed scheduling
technique significantly improves the power characteristics of
high-performance processors over an existing power-unaware modulo
scheduling technique.
-
Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS
Processors [p. 46]
-
Flavius Gruian (Lund University)
The work presented in this paper addresses scheduling for reduced
energy of hard real-time tasks with fixed priorities assigned in a rate
monotonic or deadline monotonic manner. The approach we
describe can be exclusively implemented in the RTOS. It targets
energy consumption reduction by using both on-line and off-line
decisions, taken both at task level and at task-set level.We consider
sets of independent tasks running on processors with dynamic voltage
supplies (DVS). Taking into account the real behavior of a realtime
system, which is often better than the worst case, our methods
employ stochastic data to derive energy efficient schedules. The
experimental results show that our approach achieves more important
energy reductions than other policies from the same class.
Keywords
Low-energy, hard real-time, RTOS, scheduling
Session Chair: TBD
-
Analysis and Design of Low-Energy Flip-Flops [p. 52]
-
Dejan Markovi¾, Borivoje Nikoli¾, Robert W. Brodersen (University of California, Berkeley)
This paper develops a methodology for selecting and optimizing
flip-flops for low-energy systems with constant throughput.
Characterization metrics, relevant to low-energy systems are
discussed, providing insight into timing and energy parameters at
both the circuit and system levels. Transistor sizes are optimized
for minimal delay under constrained energy consumption. This
methodology is applied to characterization of various flip-flop
styles and their comparison in 0.25µm CMOS technology under
scaled supply voltages. A transmission-gate master-slave latchpair
has the largest internal race margin, lowest energy
consumption, and has energy-delay product comparable to much
faster pulse-triggered latches.
Keywords
VLSI, Digital CMOS, flip-flops, low-power design, low-voltage.
-
Analysis of Clocked Timing Elements for Dynamic Voltage Scaling Effects
over Process Parameter Variation [p. 56]
-
Hoang Q. Dao (University of California, Davis), Kevin Nowka (IBM Austin Research Lab),
Vojin G. Oklobdzija (University of California, Davis)
In power-constrained systems, the power efficiency of latches and
flip-flops is pivotal. Characteristics of three selected latches and
FFs were analyzed for their behavior under voltage scaling and
different process corners in a 0.18um CMOS technology. The
relative performance amongst the latches/FFs was consistent
across the different supply voltages. At low-voltage power-delay product
was degraded by about 25%. Energy-delay-product was
approximately doubled at low-voltage ö for all latches/FFs over
all process corners. This result was smaller in comparison to the
ideal voltage scaling characteristics mainly because the effects of
velocity saturation were less severe at low voltage. All three
designs suffered more due to process variation under low-voltage
conditions.
Categories and Subject Descriptors
Digital circuit: clocked-timing elements
General Terms
Measurement, Performance, Reliability
Keywords
Clocked timing elements, voltage scaling, process variation
-
A Low-Power Motion Estimation Block for Low Bit-Rate Wireless Video* [p. 60]
-
R. Steven Richmond, Dong Sam Ha (Virginia Tech)
This paper presents a low-power design of a motion estimation
block targeting for a low-bit rate video codec H.263. The block is
based on the Four-Step Search algorithm. The proposed design
offers up to 38 % power reduction for logic blocks alone over a
"baseline" implementation of the Four-Step Search (4SS) algorithm
and up to 58 % power reduction over a baseline model of the Three-
Step Search (TSS) algorithm. In addition, our design reduces power
dissipation of an on-chip memory by up to 32% over the 4SS and
27% over the TSS.
-
Power-aware Partitioned Cache Architectures [p. 64]
-
S. Kim, N. Vijaykrishnan, M. Kandemir, A. Sivasubramaniam, M. J. Irwin, E. Geethanjali
(Pennsylvania State University)
This paper focuses on partitioning the cache resources architecturally
for energy and energy-delay optimizations. Specifically,
we investigate ways of splitting the cache into several
smaller units, each of which is a cache by itself (called
subcache). Subcache architectures not only reduce the per-access energy
costs but can potentially improve the locality
behavior as well. We present a unified framework for designing
, implementing and evaluating different subcache architectures.
Different techniques for data placement, subcache
prediction, and selective probing are proposed and evaluated
using a diverse set of applications. The results show
that intelligent subcache mechanisms proposed in this paper
are effective.
-
A Low-Leakage Dynamic Multi-Ported Register File in 0.13µm CMOS [p. 68]
-
Atila Alvandpour, Ram Krishnamurthy, K. Soumyanath, Shekhar Borkar (Intel Corporation)
Increasing leakage currents combined with reduced noise
margins are seriously degrading the robustness of dynamic
circuits. This paper describes a dynamic implementation of a
256X32b 4-read/write-port Register-File for ~6GHz operation at
1.2V in a 0.13mm technology. The pre-charged local bit-lines
utilize an efficient conditional keeper-technique, where a large
fraction of the keeper is turned ON only if the dynamic output
remains High in the evaluation phase. Using this technique, we
are able to improve upon all-low-Vt performance by 4%, while
maintaining Dual-Vt usage. Thus, the robustness is improved by
96% and the active leakage power is reduced by 5X.
-
Energy-Efficient Load and Store Reuse [p. 72]
-
Jun Yang, Rajiv Gupta (The University of Arizona)
A load and store reuse mechanism can be used for filtering
memory references to reduce memory activity including on-chip
cache activity. The challenging aspect of this task is
to ensure that energy savings achieved in memory are not
offset by energy used by the reuse hardware. In this paper
we present the design of a reuse mechanism which has been
carefully tuned to achieve net energy savings. In contrast
to traditional filter cache designs which trade-off energy reductions
with higher execution times, our approach reduces
both energy and execution time.
Session Chair: TBD
-
Compiler Support for Block Buffering [p. 76]
-
Mahmut Kandemir (The Pennsylvania State University), J. Ramanujam (Louisiana State University),
Uger Sezer (University of Wisconsin ö Madison)
On-chip caches consume a significant fraction of energy in current microprocessors.
Hence, hardware techniques such as block buffering have been
developed and shown to be effective in reducing on-chip cache energy consumption.
We are not aware of any software solutions to exploit block
buffering. This paper presents a compiler-based approach that modifies both
code and variable layout to effectively exploit block buffering, and is aimed
at the class of embedded codes that make heavy use of scalar variables. Unlike
previous work that uses only storage pattern optimization, our solution
integrates both code restructuring and storage pattern optimization. Experimental
results on a set of complete programs demonstrate that our solution
leads to significant energy savings.
-
Automatic Source Code Specialization for Energy Reduction [p. 80]
-
Eui-Young Chung (Stanford University), Luca Benini (University di Bologna),
Giovanni De Micheli (Stanford University)
This paper presents a framework to reduce the computational
effort of software programs, using value profiling and
partial evaluation. Our tool reduces computational effort by
specializing a program for highly expected situations and
such a reduction translates into both energy and performance
improvement. Procedure calls executed frequently
with same parameter values are defined as highly expected
situations (common cases). The choice of the best transformation
of common cases is achieved by solving three search
problems. The first identifies effective common cases to be
specialized, the second searches for an optimal solution for
effective common case, and the third examines the interplay
among the specialized cases. Our technique improves both
energy consumption and performance of the source code
up to more than twice and in average about 25% over the
original program. Also, our pruning techniques reduce the
searching time by 80% compared to exhaustive approach.
-
FV Encoding for Low-Power Data I/O [p. 84]
-
Jun Yang, Rajiv Gupta (The University of Arizona)
The power consumed by I/O pins of a CPU is significant due
to high capacitances associated with the pins. While highly
effective techniques for reducing address bus switching exist
[1], similarly effective techniques for data bus have not been
developed. We have discovered a characteristic of values
transmitted over the data bus according to which a small
number of distinct values, called frequent values, account
for 58-68% of transmissions over the external data bus. To
exploit this characteristic we have developed a method for
dynamic identication of frequent values and their use in
encoding data values using FV (frequent value) encoding
scheme. Our experiments show that FV encoding of 32 frequent
values yields an average reduction of 42.7% (with on-chip
data cache) and 67.63% (without on-chip data cache)
in data bus switching activity for SPEC95 benchmarks.
-
Time-to-Failure Estimation for Batteries in Portable Electronic Systems [p. 88]
-
Daler Rakhmatov, Sarma B. K. Vrudhula (University of Arizona)
Nonlinearity of the energy source behavior in portable systems
needs to be modeled in order for the system software
to make energy-conscious decisions. We describe an analytical
battery model for predicting the battery time-to-failure
under variable discharge conditions. Our model can be used
to estimate the impact of various system load profiles on the
energy source lifetime. The quality of our model is evaluated
based on the simulation of a lithium-ion battery.
-
Architecture Strategies for Energy-Efficient Packet Forwarding
in Wireless Sensor Networks [p. 92]
-
Vlasios Tsiatsis, Scott A. Zimbeck, Mani B. Srivastava (University of California, Los Angeles)
The energy-efficient communication among wireless sensor nodes
determines the lifetime of a sensor network and exhibits patterns
highly dependable on the sensor application and networking
software. This software is responsible for processing the sensor
data and disseminating the data to other nodes or a central
repository. In this paper we propose a node architecture that takes
advantage of both the intelligence of the radio hardware and the
needs of applications to efficiently handle the packet forwarding.
It exploits principles widely used in modern firewall network
architectures and as our analysis shows achieves considerable
energy savings.
Keywords
Energy-efficient packet forwarding, sensor networks
-
Modulation Scaling for Energy Aware Communication Systems [p. 96]
-
Curt Schurgers, Olivier Aberthorne, Mani B. Srivastava (University of California, Los Angeles)
In systems that require low energy consumption, voltage scaling is
an invaluable circuit technique. It also offers energy awareness,
trading off energy and performance. In wireless handheld devices,
the communication portion of the system is a major power hog.
We introduce a new technique, called modulation scaling, which
exhibits benefits similar to those of voltage scaling. It allows us to
trade off energy against transmission delay and as such introduces
the notion of energy awareness in communications. Throughout
our discussion, we emphasize the analogy with voltage scaling. As
an example application, we present an energy aware wireless
packet scheduling system.
Keywords
energy awareness, adaptive modulation, scaling
Session Chair: Murli Tirumala (Intel)
-
Cooling and Power Considerations for Semiconductors into the Next
Century [p. 100]
-
Christian Belady (Hewlett Packard)
With the insatiable desire for higher computer or switch
performance comes the undesirable side effect of higher
power especially with the pervasiveness of CMOS
technology. As a result, cooling and power delivery have
become integral in the design of electronics. Figure 1 shows
the National/International Technology Roadmap For
Semiconductorsâ projection for processor chip power.
Note that between the year 2000 and 2005 that the total
power of the chip is expected to increase 60%, which will put
additional emphasis on the power and cooling systems of our
electronics. Further inspection of this figure also shows that
the heat flux will more than double during this period. The
increases in power and heat flux are driven by two factors,
higher frequency and reduced feature sizes.
Session Chair: Frank Chang (UCLA)
Session Organizer: Satyen Mukherjee (Phillips)
-
Energy Efficient Modulation and MAC for Asymmetric RF Microsensor Systems [p. 106]
-
Andrew Y. Wang, SeongHwan Cho, Charles G. Sodini, Anantha P. Chandrakasan
(Massachusetts Institute of Technology)
Wireless microsensor systems are used in a variety of civil
and military applications. Such microsensors are required to
operate for years from a small energy source. To minimize
the energy dissipation of the sensor node, RF front-end circuitry
must be designed based on system level optimization
of the entire network. This paper presents several energy
minimization techniques derived from the unique properties
of a practical short range asymmetric microsensor system.
These include energy efficient modulation schemes, appropriate
multiple access protocols, and a fast turn-on transmitter
architecture.
-
A 1 V, 1.9 GHz Mixer Using a Lateral Bipolar Transistor in CMOS [p. 112]
-
Song Ye (University of Toronto), Koji Yano (Yamanashi University),
C. Andre T. Salama (University of Toronto)
This paper describes a low power mixer implemented in a
standard 0.25 um CMOS process. The mixer uses lateral bipolar
transistors in CMOS to form the core of the circuit. No additional
processing steps are needed to obtain the BJT when the MOSFET
is properly designed. The mixer exhibits 6.5 dB gain, operating at
1.9 GHz from a 1 V supply and a power dissipation of 1.3 mW.
Such a mixer is a likely candidate for low power portable wireless
applications.
Categories and Subject Descriptors
1.3 [Analog, MEMS and Mixed Signal Electronics]: RF
circuits, Wireless systems, MEMS circuits, AD/DA Converters,
Mixed-signal circuits, DC-DC conversion.
General Terms
Measurement, Design, Experimentation.
Keywords
RF, CMOS, mixer, lateral bipolar transistor, low power.
-
A 60dB, 246MHz CMOS Variable Gain Amplifier for Subsampling
GSM Receivers [p. 117]
-
Mohamed A. I. Mostafa (Texas A&M University), Sherif H. K. Embabi (Texas Instruments Inc.),
Mostafa A. I. Elmala (Texas A&M University)
This VGA is designed for a GSM subsampling receiver. It
operates at an IF frequency of 246MHz. The VGA provides a
60dB digitally controlled gain range in 2dB steps. The VGA is
implemented in a 0.35µm CMOS process. The current is
9mA@3V. The overall gain accuracy is less than 0.3dB. The
noise figure at maximum gain is 8.7dB. The IIP3 is ö4dBm at
minimum gain.
Categories and Subject Descriptors
1.3 [Analog, MEMS and Mixed Signal Electronics]: RF
circuits, Wireless systems, and mixed-signal circuits.
General Terms
Performance, Design, Experimentation, Standardization.
Keywords
VGA, CMOS, subsampling, GSM, IF, receiver.
Session Chair: Wolfgang Nebel (Univ. Oldenburg)
Session Organizer: Radu Marculescu (Carnegie Mellon University)
-
VTCMOS Characteristics and Its Optimum Conditions Predicted
by a Compact Analytical Model [p. 123]
-
Hyunsik Im, T. Inukai, H. Gomyo, T. Hiramoto, T. Sakurai (University of Tokyo)
A very compact analytical model of variable threshold voltage
CMOS (VTCMOS) is proposed to study the active on-current,
linking it with the stand-by off-current characteristics.
Comparisons of modeled results to numerical simulations and
experimental data are made with an excellent agreement. It is
clearly demonstrated using the model that speed degradation due
to low supply voltage can be compensated by the VTCMOS
scheme with even smaller power. Influence of the short channel
effect (SCE) on the performance of VTCMOS is investigated in
terms of a new parameter, dS/d‹, both qualitatively and
quantitatively. It is found that the SCE degrades the VTCMOS
performance. Issues on the optimum conditions of VTCMOS are
discussed.
Keywords
Body Effect, Variable threshold voltage CMOS (VTCMOS),
Substrate bias, Low power, and Analytical model
-
Memory Controller Policies for DRAM Power Management [p. 129]
-
Xiaobo Fan, Carla S. Ellis, Alvin R. Lebeck (Duke University)
The increasing importance of energy efficiency has produced
a multitude of hardware devices with various power management
features. This paper investigates memory controller
policies for manipulating DRAM power states in cache-based
systems. We develop an analytic model that approximates
the idle time of DRAM chips using an exponential distribution,
and validate our model against trace-driven simulations. Our
results show that, for our benchmarks, the simple
policy of immediately transitioning a DRAM chip to a lower
power state when it becomes idle is superior to more sophisticated
policies that try to predict DRAM chip idle time.
-
Run-Time Power Estimation in High Performance Microprocessors [p. 135]
-
Russ Joseph, Margaret Martonosi (Princeton University)
Power concerns are becoming increasingly pressing in high-performance
processors. Building power-aware and even
power-adaptive computer architectures requires being able
to track power consumption and attribute energy consumption
to the portions of the chip that are responsible for it.
This paper presents the Castle project which aims to deduce
the actual runtime power dissipated by different processor
units on the CPU chip by leveraging existing hardware.
Namely, we examine the use of hardware performance
counters as proxies for power meters. We discuss which performance
counters count power-relevant events, and how to
estimate event counts for power-relevant events not well supported
by current, commonly available performance counters. We also discuss
sampling-based approaches for estimating signal transition activity
within the processor. Overall, we find that these performance counters
can be quite useful in providing good power apportionment estimates for
programs as they run.
-
Fast, Flexible, Cycle-Accurate Energy Estimation [p. 141]
-
Phillip Stanley-Marbell, Michael S. Hsiao (Rutgers University)
Designing energy efficient hardware and software systems
demands different tools at various levels in the design hierarchy.
There is however a dearth of tools to enable investigation and
implementation of energy efficient software and
hardware architectures. Presented is a fast, exible, cycle-accurate
architectural simulator, Myrmigki, that models a
commercial microcontroller and microprocessor family, and
enables cycle-accurate power dissipation analyses through a
combination of instruction level power analysis and circuit
activity estimation.
Myrmigki is intended to be used to study the effect of
microarchitectural features on the energy e®ciency of hardware
and software systems. It provides facilities for dynamic
voltage scaling, clock speed setting and per-cycle architecture
reconfiguration, and is easily extended to add new microarchitectural
features and model new instruction set architectures. The simulator
provides over an order of magnitude speedup over a contemporary
state-of-the-art power estimating simulator, while providing estimates
within 10% of measurements from prototype hardware that it models.
Session Chair: Borivoje Nikolic (University of California, Berkeley)
Session Organizer: Tadahiro Kuroda (Keio University)
-
Comparative Delay and Energy of Single Edge-Triggered & Dual
Edge-Triggered Pulsed Flip-Flops for High-Performance Microprocessors [p. 147]
-
James Tschanz, Siva Narendra, Zhanping Chen, Shekhar Borkar, Vivek De (Intel Corporation),
Manoj Sachdev (University of Waterloo)
Flip-flops and latches are crucial elements of a design from both a
delay and energy standpoint. We compare several styles of single
edge-triggered flip-flops, including semidynamic and static with
both implicit and explicit pulse generation. We present an
implicit-pulsed, semidynamic flip-flop (ip-DCO) which has the
fastest delay of any flip-flop considered, along with a large
amount of negative setup time. However, an explicit-pulsed static
flip-flop (ep-SFF) is the most energy-efficient and is ideal for the
majority of critical paths in the design. In order to further reduce
the power consumption, dual edge-triggered flip-flops are
evaluated. It is shown that classic dual edge-triggered designs
suffer from a large area penalty and reduced performance,
prohibiting their use in critical paths. A new explicit-pulsed dual
edge-triggered flip-flop is presented which provides the same
performance as the single edge-triggered version with
significantly less energy consumption in the flip-flop as well as in
the clock distribution network.
Keywords
Flip-flops, latches, clocking, dual edge-triggered, low power.
-
Theory and Practical Implementation of Harmonic Resonant Rail Driver [p. 153]
-
Joong-Seok Moon, Peter A. Beerel, (University of Southern California),
William C. Athas (Apple Computer)
This paper presents a new algorithm for designing efficient
harmonic resonant rail drivers. The circuit solution is coupled to a
standard pulse source and uses only discrete passive components.
It can thus be externally tuned to minimize the consumed power
in the target IC. A new efficient algorithm based on current-fed
pulse-forming network theory is proposed to find the value of
each discrete component for a target frequency and a given load
capacitance. The proposed driver topology can be used to
generate any desired periodic 50% duty-cycle waveform by
superimposing multiple harmonics of the desired waveform,
however, this paper focuses on the generation of square-wave
clock signals. We have tested the driver with a capacitive load
between 38.3pF and 97.8pF. The overall dissipation for our
second-order harmonic rail driver is 19% of fCV2 at 15MHz and
97.8pF load.
Keywords
Harmonic-resonant rail driver, energy-recovery circuit, pulseforming
network, clock generation.
-
A Resonant Clock Generator for Single-Phase Adiabatic Systems [p. 159]
-
Conrad H. Ziesle, Marios C. Papaefthymiou, (University of Michigan),
Suhwan Kim (IBM T.J. Watson Research Center)
Recently discovered high-speed single-phase adiabatic logic families
require efficient sinusoidal power-clock generators. In this paper
we propose a low-power resonant clock-generator built around
a zero-voltage switching push-pull power conversion topology. We
describe a novel energy-efficient control circuit for this power
converter, based on an asynchronous CMOS state machine. We also
describe an integrated sub-micron CMOS implementation of our
power converter and control circuits. Simulation results show
efficiencies in excess of 90%, even under suboptimal tuning conditions,
for frequencies over 200MHz. We have fabricated our clock
generator in a 0.5m standard CMOS process. Using an external
surface-mount inductor as the resonant element, we have verified
the correct operation of the clock generator when driving a
single-phase adiabatic 8-bit multiplier.
Categories and Subject Descriptors
B.0 [Hardware]: General
Keywords
Adiabatic logic, Clock generator, CMOS, Low energy, Resonant,
Single phase, VLSI, Dynamic circuitry, SCAL, SCAL-D, TSEL.
-
Enhanced Multi-Threshold (MTCMOS) Circuits Using Variable Well Bias [p. 165]
-
Stephen V. Kosonocky, Mike Immediato (IBM T.J. Watson Research Center), Peter Cottrell,
Terence Hook, Randy Mann, Jeff Brown (IBM Microelectronics)
Advanced CMOS technology can enable high levels of performance
with reduced active power at the expense of increased
standby leakage. MTCMOS has previously been described as a
method of reducing leakage in standby modes, by addition of a
power supply interrupt switch. Enhancements using variable well
bias and layout techniques are described and demonstrate
increased performance and reduced leakage over conventional
MTCMOS circuits.
Keywords
MTCMOS, multi-threshold, variable well bias, leakage control,
low power digital circuit design.
Session Chair: Masahiro Asada (Univ. of Tokyo)
Session Organizer: Renu Mehra (Synopsys)
-
Encodings for High-Performance Energy-Efficient Signaling [p. 170]
-
Alessandro Bogliolo (University of Ferrara)
Energy efficiency, performance and signal integrity are conflicting
critical requirements for on-chip signaling. We propose
a code-based solution that improves bit rate while reducing
communication energy and preserving noise margins.
Our technique is based on the observation that RC lines
can be used at twice their limiting bit rate to transmit bit
streams with no isolated bits. We propose new encodings
(called minimum run-length guaranteed codes, MRLG) that
eliminate isolated bits, thus enabling double-bit-rate signaling.
We show that our encodings can be combined with
any low-power code to achieve both energy reduction and
performance improvement.
-
Low-Energy Encoding for Deep-Submicron Address Buses [p. 176]
-
Luca Macchiarulo, Enrico Macii, Massimo Poncino (Politecnico di Torino)
In this paper, we introduce a new encoding scheme that explicitly
targets the minimization of the bus energy due to the
crosstalk capacitances between adjacent bus lines. The key transformation
operated by the code consists of a permutation of
the bus lines, implemented directly during physical design; as
a desirable consequence, no additional encoding/decoding logic
is required at the bus boundaries, thus implying that no latency
penalty is introduced on the processor-memory path. An additional
feature of the permutation-based code is that the encoding
function can be determined without any knowledge of the binary
stream being transmitted. Therefore, the code can be effectively
exploited in general-purpose computing systems. The proposed
code works best on address buses; savings obtained for different
address traces generated by two different processors are in the
order of 26% with respect to the unencoded streams.
-
Irredundant Address Bus Encoding for Low Power [p. 182]
-
Yazdan Aghaghiri, Massoud Pedram, (University of Southern California),
Farzan Fallah (Fujitsu Laboratories of America)
This paper proposes efficient encoding techniques for
decreasing power dissipation on global buses. The best target for
these techniques is a wide and highly capacitive memory bus.
Building on T0 and Offset-Xor encoding techniques, we present
three irredundant bus-encoding techniques. Our methods
decrease switching activity up to 83% without the need for
redundant bus lines. The power dissipation of encoder and
decoder circuitry has also been calculated and shown to be small
in comparison with the power savings on the memory address
bus itself.
-
Low Power Address Encoding using Self-Organizing Lists [p. 188]
-
Mahesh Mamidipaka, Dan Hirschberg, Nikil Dutt (University of California, Irvine)
Off-chip bus transitions are a major source of power dissipation
for embedded systems. In this paper, new adaptive encoding
schemes are proposed that significantly reduce
transition activity on data and multiplexed address buses,
that do not add redundancy in space or time and which
have minimal delay overhead. These adaptive techniques are
based on self-organising lists to achieve reduction in transition
activity by exploiting the spatial and temporal locality
of the addresses. Unlike previous approaches that focus on
instruction address buses, experiments demonstrate significant
reduction in transition activity of up to 54% in data
address buses and up to 59% in multiplexed address buses.
The average reductions are twice those obtained using current
schemes on a data address bus and more than twice
those obtained on a multiplexed address bus.
Session Chair: Ingrid Verbauwhede (UCLA)
-
Wireless Sensor Networks: Application Driver for Low Power
Distributed Systems [p. 194]
-
Deborah Estrin (University of California, Los Angeles)
Wireless sensor networks allow deployment of sensing elements
close to the phenomena of interest. Sensing close to the signal
generation point should lead to improved SNR in general, and
enable detection in otherwise obstructed environments. This
fundamental benefit of local sensing, combined with the
decreasing cost and increasing availability of low cost
microsensors/actuators and processors, suggests that effective
systems will exploit densely distributed elements. However, dense
sensing capability is only scalable if the elements are networked
to support collaborative processing near the sensory inputs. [1]
Therefore, in many contexts low-power wireless communication
is a critical enabler of these systems because it overcomes the
logistical infeasibility of deploying wires in remote, dynamic, and
mobile-node, contexts.
Session Chair: Fari Assaderaghi (SiliconWave)
Session Organizer: Rajiv Joshi (IBM)
-
Scaling of Stack Effect and its Application for Leakage Reduction [p. 195]
-
Siva Narendra (Massachusetts Institute of Technology & Intel), Shekhar Borkar, Vivek De,
Dimitri Antoniadis (Intel), Anantha Chandrakasan (Massachusetts Institute of Technology)
Technology scaling demands a decrease in both Vdd and Vt
to sustain historical delay reduction, while restraining active power
dissipation. Scaling of Vt however leads to substantial increase in
the sub-threshold leakage power and is expected to become a
considerable constituent of the total dissipated power. It has been
observed that the stacking of two off devices has smaller leakage
current than one off device. In this paper we present a model that
predicts the scaling nature of this leakage reduction effect. Device
measurements are presented to prove the modelâs accuracy. Use
of stack effect for leakage reduction and other implications of this
effect are discussed.
-
Variable Threshold Voltage CMOS (VTCMOS) in Series Connected Circuits [p. 201]
-
Takashi Inukai, Toshiro Hiramoto, Takayasu Sakurai (University of Tokyo)
Characteristics of variable threshold voltage CMOS (VTCMOS)
in the series connected circuits are investigated by means of device
simulation. It is newly found that the performance degradation due
to the body effect in series connected circuit is suppressed by
utilizing VTCMOS. Lowering the threshold voltage (Vth)
enhances the drive current and alleviates the degradation due to
the series connected configuration. Therefore, larger body effect
factor (‹) results in lower Vth and higher on-current even in the
series connected circuits. These characteristics are attributed to
the velocity saturation phenomenon which reduces the drain
saturation voltage (Vdsat).
Keywords
variable threshold voltage CMOS, series connected circuits,
degradation factor, body effect factor, substrate bias, velocity
saturation
-
Effectiveness of Reverse Body Bias for Leakage Control
in Scaled Dual Vt CMOS ICS [p. 207]
-
A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, V. De (Intel Corporation)
We examine the effectiveness of opportunistic use of reverse body
bias (RBB) to reduce leakage power during active operation, burn-in,
and standby in 0.18µm single-Vt and 0.13µm dual-Vt logic process
technologies. We investigate its dependencies on channel length,
target Vt, temperature and technology generation. We show that
RBB becomes less effective for leakage reduction at shorter channel
lengths and lower Vt at both high and room temperatures, especially
when target intrinsic leakage currents are high. RBB effectiveness
also diminishes with technology scaling primarily because of
worsening short-channel effects (SCE), particularly when target Vt
values are low. We present a model that relates different transistor
leakage components to full-chip leakage current, and validate the
model through testchip measurements across a range of RBB values.
-
Double-Gate Fully-Depleted SOI Transistors for Low-Power High-Performance
Nano-Scale Circuit Design [p. 213]
-
Rongtian Zhang, Kaushik Roy, David B. Janes (Purdue University)
Double-gate fully-depleted (DGFD) SOI circuits are regarded as
the next generation VLSI circuits. This paper investigates the impact
of scaling on the demand and challenges of DGFD SOI circuit
design for low power and high performance. We study how the
added back-gate capacitance affects the circuit power and performance;
how to trade off the enhanced short-channel effect immunity
with the added back-channel leakage; and how the coupling
between the front- and back-gates affects circuit reliability. Our
analyses over different technology generations using MEDICI device
simulator show that DGFD SOI circuits have significant advantages
in driving high output load. DGFD SOI circuits also show
excellent ability in controlling leakage current. However, for low
output load, no gain is obtained for DGFD SOI circuits. Also, it is
necessary to optimize the back-gate oxide thickness for best leakage
control. Moreover, threshold variation may cause reliability
problem for thin back-gate oxide DGFD SOI circuits operated at
low power supply voltage.
Session Chair: Sumit Roy (Cadence)
Session Organizer: M. Poncino (Politecnico di Torino)
-
A Self-Optimizing Embedded Microprocessor using a Loop Table
for Low Power [p. 219]
-
Frank Vahid, Ann Gordon-Ross (University of California, Riverside)
We describe an approach for a microprocessor to tune itself to its
fixed application to reduce power in an embedded system. We
define a basic architecture and methodology supporting a
microprocessor self-optimizing mode. We also introduce a loop
table as a tunable component, although self-optimization can be
done for other tunable components too. We highlight
experimental results illustrating good power reductions with no
performance penalty.
Keywords
System-on-a-chip, self-optimizing architecture, embedded
systems, parameterized architectures, cores, low-power, tuning,
platforms.
-
Low Power Pipelining of Linear Systems: A Common Operand
Centric Approach [p. 225]
-
Daehong Kim, Kiyoung Choi, (Seoul National University),
Dongwan Shin (University of California, Irvine)
In this paper, we propose a systematic pipelining method for a linear
system to minimize power and maximize throughput, given a
constraint on the number of pipeline stages and a set of resource
constraints. The method first retimes operations such that as many
operations as possible take common operands as their inputs, and
then performs the operand sharing based on the list scheduling.
Experimental results show that the proposed approach reduces the
power consumption of the functional units by up to more than 20%,
compared to the state-of-the-art pipelining and operand sharing
techniques.
Keywords
Low power, pipelining, operand sharing, common operand
-
A System-level Energy Minimization Approach using Datapath
Width Optimization [p. 231]
-
Yun Cao, Hiroto Yasuura (Kyushu University)
This paper presents a novel system-level approach that minimizes
the energy consumption of embedded core-based systems
through datapath width optimization. It is based on
the idea of minimizing energy consumed by redundant bits,
which are unused during execution of programs by means
of optimizing the datapath width of processors. To minimize
the redundant bits of variables in a given application
program, the e.ective size of each variable is determined
by variable size analysis, and Valen-C language is used to
preserve the precision of computation. Analysis results of
variables show that there are average 39% redundant bits
in the C source program of MPEG-2 video decoder. In our
experiments for several embedded applications, energy savings
without performance penalty are reported range from
about 10.8% to 48.3%.
Keywords
System-level energy minimization, variable size analysis, datapath
optimization
-
Energy-Efficient Instruction Dispatch Buffer Design for Superscalar Processors
[p. 237]
-
Gurhan Kucuk, Kanad Ghose, Dmitry V. Ponomarev (State University of New York),
Peter M. Kogge (University of Notre Dame)
The instruction dispatch buffer (DB, also known as an issue queue)
used in modern superscalar processors is a considerable source of
energy dissipation. We consider design alternatives that result in
significant reductions in the power dissipation of the DB (by as much
as 60%) through the use of: (a) fast comparators that dissipate energy
mainly on a tag match, (b) zero byte encoding of operands to imply
the presence of bytes with all zeros and, (c) bitline segmentation. Our
results are validated by the execution of SPEC 95 benchmarks on true
hardware level, cycleöbyöcycle simulator for a superscalar processor
and SPICE measurements for actual layouts of the DB and its variants
in a 0.5 micron CMOS process.
Keywords: Lowöpower superscalar datapath, low power
comparator, low power instruction scheduling, bitline segmentation
Session Chair: Sudhir Gowda (IBM)
-
High Density Capacitance Structures in Submicron CMOS for Low Power
RF Applications [p. 243]
-
Tirdad Sowlati, Vickram Vathulya, Domine Leenaerts (Philips Research)
This paper presents four novel interconnect based capacitors with
2 to 3 times the capacitance density of a conventional metal
sandwich capacitor and with self-resonant frequencies above 20
GHz, suitable for low power RF applications. Unlike the
conventional capacitor, the capacitance density of these structures
increases with the scaling of the technology. The structures have
been fabricated in both 0.25 µm and 0.18 µm CMOS
technologies, measured and an equivalent circuit presented.
Categories and Subject Descriptors
1.3. Analog, MEMS and Mixed Signal Electronics
General Terms
Measurement, Documentation, Experimentation.
Keywords
CMOS, interconnect, RF passives, Bluetooth, HiPerLAN.
-
A CMOS VCO Architecture Suitable for Sub-1 Volt High-Frequency
(8.7-10 GHz) RF Applications [p. 247]
-
Ahmed H. Mostafa, Mourad N. El-Gamal (McGill University)
This paper proposes an LC-based oscillator structure which
enables operation from a supply voltage as low as 0.85V, while
being suitable for high-frequency RF applications. Two VCO
prototypes were fabricated in a standard 0.18 µm CMOS process.
The 8.7 GHz VCO operates from a supply voltage of 0.85 V,
consumes 6 mW, and exhibits -100 dBc/Hz phase noise at
600 kHz offset. The 10 GHz prototype operates from a supply
voltage of 1 V, consumes 9 mW, and has -98 dBc/Hz phase noise
at 600 kHz offset. A tuning range of 400-450 MHz is achieved
without using varactors.
-
Low-Power Direct Sequence Spread-Spectrum Modem Architecture [p. 251]
-
Charles Chien, Igor Elgorria (Rockwell Research), Charles McConaghy (Livermore National Lab)
Emerging CMOS and MEMS technologies enable the
implementation of a large number of wireless distributed
microsensors that can be easily and rapidly deployed to form highly
redundant, self-configuring, and ad hoc sensor networks. To
facilitate ease of deployment, these sensors should operate on
battery for extended periods of time. A particular challenge in
maintaining extended battery lifetime lies in achieving
communications with low power. This paper presents a directsequence
spread-spectrum modem architecture that provides robust
communications for wireless sensor networks while dissipating very
low power. The modem architecture has been verified in an FPGA
implementation that dissipates only 33 mW for both transmission
and reception. The implementation can be easily mapped to an
ASIC technology with an estimated power performance of less than
1 mW.
Keywords
Low power, modem, spread spectrum, direct sequence, sensor.
-
Effects of Elevated Temperature on Tunable Near-Zero Threshold CMOS [p. 255]
-
Vjekoslav Svilan, G. Leonard Tyler (Stanford University), James B. Burr (Sun Microsystems)
This paper explores functionality, performance, and energy
efficiency of an 80,000 transistor, 0.35um, back-bias tunable,
near-zero Vth , 32 x 32-bit multiplier operating at 100 deg C.
Compared to operation at 28 deg C, performance at Vdd=2.0 V
degrades 14 percent from 188MHz to 162MHz. At lower
supply voltages, back bias is adjusted to minimize power
dissipation as a function of operating frequency similarly to
what we reported last year at 28®C. Comparing the operating
points, the same performance at 100 deg C requires about
1.5 times the power measured at 28 deg C. It also requires about
1.2 V additional back bias and about a 20 percent increase
in Vdd . The fraction of total power dissipated as leakage
increases by about 1.5 times.
-
A Sub-1V Dual-Threshold Domino Circuit Using Product-of-Sum Logic [p. 259]
-
Koji Fujii, Takakuni Douseki, Yuichi Kado (NTT Telecommunications Energy Laboratories)
A sub-1 V dual-threshold Domino circuit is proposed to
accelerate the operation of CMOS digital circuits at below
1 V. The circuit combines a low and high thresholdvoltage
(Vt) MOSFET with standby control to make it possible
to achieve high-speed evaluation and low standby leakage
current. A low-Vt foot nMOSFET is used to shorten
precharge time and increase throughput. A product-of-sum
logic form is used for implementation of a pull-down logic
to increase the noise margin. An experimental 64-bit
carrylook-ahead (CLA) adder demonstrated a 0.6-V operation
with a standby power of 0.4 µW and a delay time of 4.8 ns.
-
Mixed Multi-Threshold Differential Cascode Voltage Switch (MT-DCVS)
Circuit Styles and Strategies for Low Power VLSI Design [p. 263]
-
W. Chen, W. Hwang, P. Kudva, G. D. Gristede, S. Kosonocky,
R. V. Joshi (IBM T. J. Watson Research Center)
This paper presents mixed multi-threshold differential cascode
voltage switch (MT-DCVS) circuits for low-power, high
performance and deep- submicron VLSI design. These logic
circuits incorporate two different sets of CMOS devices,
low-Vt
and regular high-Vt CMOS devices. By appropriately
selecting the low-Vt and high-Vt devices and configurations
in a circuit, we can gain performance of circuit while keeping
the leakage current and power low. The key approaches
are using low-Vt devices to gain performance, using
high-Vt devices to cut off the leakage path and also using the
reverse- biased low-Vt devices in their standby state. The
methodology and algorithm are developed and simulated.
The applications of such multi-Vt circuit techniques to the
static, domino NORA DCVS and delayed reset circuits are
described. The use of footer / header devices, gated-Vdd
and a mixture of low-Vt and high-Vt devices to reduce power
dissipation and subthreshold leakage current during standby
and active modes, and the global design issues are also discussed.
-
Selectively Clocked Skewed Logic (SCSL): A Robust Low-Power Logic
Style for High-Performance Applications [p. 267]
-
Naran Sirisantana, Aiqun Cao, Shawn Davidson, Cheng-Kok Koh, Kaushik Roy (Purdue University)
In very high performance designs, dynamic circuits, such as
Domino Logic, are used because of their high speed. Skewed
logic circuits can be used to achieve designs having performance
comparable to that of Domino but with better scalability.
Moreover, a selective clocking scheme may be applied to enhance
the power savings for skewed logic circuits. This paper proposes
Selectively Clocked Skewed Logic (SCSL), a new circuit style
based on skewed logic aiming for low clock power consumption.
The results on ISCAS benchmark circuits implemented with this
circuit design style show that the total power consumption can be
reduced by (52.05)% when compared to that of Domino circuit
with comparable performance.
Categories and Subject Descriptors
1.3 [Logic and Microarchitecture Design]: Logic and RTL
design.
Session Chair: Giovanni DeMicheli (Stanford)
-
A Profile-Based Energy-Efficient Intra-Task Voltage Scheduling Algorithm
for Hard Real-Time Applications [p. 271]
-
Dongkun Shin, Jihong Kim (Seoul National University)
Intra-task voltage scheduling (IntraVS), which adjusts the supply
voltage within an individual task boundary, is an effective technique
for developing low-power applications. In this paper, we
propose a novel intra-task voltage scheduling algorithm for hard
real-time applications based on average-case execution information.
Unlike the original IntraVS algorithm where voltage scaling
decisions are based on the worst-case execution cycles, the proposed
algorithm improves the energy efficiency by controlling the
execution speed based on average-case execution cycles while still
meeting the real-time constraints. The experimental results using
an MPEG-4 decoder program show that the proposed algorithm
reduces the energy consumption by up to 34% over the original
IntraVS algorithm.
-
Compiler-Directed Dynamic Voltage/Frequency Scheduling for Energy
Reduction in Microprocessors [p. 275]
-
Chung-Hsing Hsu, Ulrich Kremer, Michael Hsiao (Rutgers University)
Dynamic voltage and frequency scaling of the CPU has been
identi.ed as one of the most e.ective ways to reduce energy
consumption of a program. This paper discusses a compilation
strategy that identifies scaling opportunities without
significant overall performance penalty. Simulation results
show CPU energy savings of 3.97%-23.75% for the
SPECfp95 benchmark suite with a performance penalty of
at most 2.53%.
-
Variable Voltage Task Scheduling Algorithms for Minimizing Energy [p. 279]
-
Ali Manzak, Chaitali Chakrabarti (Arizona State University)
In this paper we propose variable voltage task scheduling
algorithms (periodic as well as aperiodic) that minimize energy.
We first apply the existing task scheduling algorithms
to obtain a feasible schedule and then distribute the available
slack using an iterative algorithm that satisfies the theoretically
obtained relation for minimum energy. We show
experimentally that the voltage assignment obtained by our
algorithm is very close (0.1% error) to that of the optimal
assignment.
-
Design Methodology and Optimization Strategy for Dual-VTH Scheme Using
Commercially Available Tools [p. 283]
-
Masayuki Hirabayashi, Koichi Nose, Takayasu Sakurai (University of Tokyo)
Design methodology for dual-VTH scheme using commercially
available tools is presented and optimization strategy for the
dual-VTH scheme is discussed. In order to suppress the power
consumption, it is shown that using library cells that have various
combinations of VTHâs is not needed. The cell library, which
contains logic gates with all high VTH transistors and all
low VTH transistors, is sufficient to reduce leakage power.
0.1V is shown to be the optimum value for VTH difference between
VTH,HIGH and VTH,LOW in terms of power
reduction.
-
Synthesis of Low-Leakage PD-SOI Circuits with Body-Biasing [p. 287]
-
Mario R. Casu, Gianluca Piccinini, Guido Masera, Maurizio Zamboni, (Politecnico di Torino)
In this work we propose a methodology for the reduction
of leakage power dissipation through the use of smart body
contacts in a partially depleted Silicon-on-Insulator (PD-SOI)
technology. Reverse body biasing is used to increase
threshold voltage in standby while in active mode PD-SOI
gates switch with nominal Vth. As opposed to standard
dual-Vth techniques used in CMOS bulk circuits, PD-SOI
enables the application of body-bias to all gates included
those in critical paths without delay penalties. Results are
reported for the ISCAS85 combinational benchmarks.
-
Low-Power Technology Mapping for Mixed-Swing Logic [p. 291]
-
Nicola Dragone (Carnegie Mellon University & PDF Solutions), Rob A. Rutenbar, L. Richard Carley
(Carnegie Mellon University), Roberto Zafalon (Carnegie Mellon University & STMicroelectronics)
Mixed-swing logic employs multiple power supply rails and
device threshold voltages and allows us to create richer cell libraries with
a wider range of power/speed tradeoffs. However, mapping onto such a
library with a conventional technology mapper will not exploit the full
potential of a mixed-swing methodology. To remedy this, we have
developed a new technology mapping tool that specifically targets mixedswing
logic. Our approach combines (1) efficient clustering and clusterlevel
delay budgeting for the uncommitted logic, with (2) an exhaustive
search for the optimal cover that is rendered practical by the clustering
process. Power savings up to 3X have been demonstrated with our mixedswing
solutions versus single power supply implementations.
-
Frequency-Domain Supply Current Macro-Model [p. 295]
-
Srinivas Bodapati (University of Illinois at Urbana-Champaign), Farid N. Najm (University of Toronto)
In order to perform block level analysis of the on-chip power
distribution network, a high-level model is required that captures
the dependence of the current waveform drawn by a
logic block, per cycle, on its input vector pair. We present a
frequency domain macro-modeling technique for capturing
this dependence. The macro-model is based on estimating
the Discrete Cosine Transform (DCT) of the current waveform
and then taking the inverse transform to estimate the
time domain current waveform.
Session Chair: Supher Gouda (IBM)
Session Organizer: Ken Yang (University of California, Los Angeles)
-
A Low-Power, 5-70MHz, 7th-Order Filter with Programmable Boost,
Group Delay, and Gain Using Instantaneous Companding [p. 299]
-
Rola A. Baki, Mourad N. El-Gamal (McGill University)
A seventh-order 0.05o equiripple linear-phase
continuous-time filter employing, for the first time, instantaneous
companding, was designed and integrated in a mature bipolar process.
The amount of boost (up to 13dB) and group-delay adjustment (30%) are
digitally programmable. The DC gain is controllable up to 10dB, and
the -3dB frequency (fc) is tunable from 5 to 70MHz. The
output swing for 1% THD is higher than 100mVpp, with a 1.5V
supply. The filter consumes very low power (5-13mW for fc=
70MHz) compared to conventional implementations (e.g. 120mW for
fc= 100MHz [1]).
-
Optimizing Bias-circuit Design of Cascode Operational Amplifier
for Wide Dynamic Range Operations [p. 305]
-
Takeshi Fukumoto, Hiroyuki Okada, Kazuyuki Nakamura (NEC Corporation)
Proposed here is a bias circuit for use in a cascode operational
amplifier to provide a wide output dynamic range. The bias circuit
has been designed so that the drain-source voltage of each MOS
transistor used in the gain stage is minimized to Vdsat
automatically, making it possible to widen the output dynamic
range.
Keywords
Amplifier, CMOS, Analog, Low voltage, Dynamic range,
Cascode, Bias-circuit.
-
Leakage Current Cancellation Technique for Low Power Switched-Capacitor
Circuits [p. 310]
-
Louis S. Y. Wong, Shohan Hossain, Andre Walker (St. Jude Medical)
In this paper, we describe a circuit technique to implement low
power switched-capacitor circuits for low frequency operation.
Low power consumption is crucial for medical implant devices.
Reducing supply voltage is well known to minimize power
dissipation. To facilitate low voltage operations, the transistor's
Vth are becoming lower and lower. Low Vth transistors have high
leakage currents which impact the performance of switchedcapacitor
circuits, sample-and-hold amplifiers and many more. A
new circuit technique is presented here to largely minimize the
effective leakage current when the CMOS switch is turned off. It
employs an active feedback loop to automatically cancel both
junction and sub-threshold channel leakage. By reducing the
effective leakage current, the capacitors used in the circuit can be
significantly reduced, hence lowering the overall power
consumption. This is a general technique and can be used in
various circuit applications
Keywords
Low power, analog, leakage current, switched-capacitor circuit,
sample and hold, amplifier.
-
A 3-Pin 1.5 V 550 µmW 176 x 144 Self-Clocked CMOS Active
Pixel Image Sensor [p. 316]
-
Kwang-Bo Cho, Alexander Krymski, Eric R. Fossum (Photobit Technology Corporation)
This paper addresses the development of a micropower 176 x 144
self-clocked CMOS active pixel image sensor that dissipates one-to-two
orders of magnitude less power than current state of the art
CMOS image sensors. The chip operates from a 1.5 V voltage
source and the power consumption measured for the chip running
from an internal 25.2 MHz clock yielding 30 frames per second is
about 550 µW. This amount enables the sensor to be run from a
watch battery. It is believed that this chip is the worldâs lowest
power image sensor and the first image sensor designed for a
watch battery operation. The camera-on-a-chip operates as a selfclocked
3-pin sensor (GND, VDD (1.2 - 1.7 V), and DATAOUT).
The die occupies 4 mm2 of silicon.
Keywords
Active Pixel Sensor, Image Sensor, CMOS, Low-Power, Low-
Voltage, Self-Clocked.
Session Chair: T.N. Vijaykumar (Purdue)
Session Organizer: Babak Falsafi (Carnegie Mellon University)
-
Cached-Code Compression for Energy Minimization in Embedded Processors [p. 322]
-
Luca Benini (Universita di Bologna), Alberto Macii (Politecnico di Torino),
Alberto Nannarelli (Universita di Roma)
This paper contributes a novel approach for reducing static
code size and instruction fetch energy for cache-based core
processors running embedded applications. Our implementation
of the decompression unit guarantees fast and low-energy,
on-the-y instruction decompression at each cache lookup. The
decompressor is placed outside the core boundaries; therefore,
processor architecture does not need any
modification, making the proposed compression approach
suitable to IP-based designs. Viability of our solution is
assessed through extensive benchmarking performed on a
number of typical embedded programs.
-
Energy Efficient Turbo Decoding for 3G Mobile [p. 328]
-
David Garrett, Bing Xu, Chris Nicol (Lucent Technologies)
The requirement of turbo decoding in 3G wireless standards has
forced handset designers to consider power consumption issues in
their implementations. The phenomenal performance of turbo
codes comes at the expense of computation. Primarily this paper
looks at methods of substantially reducing the power consumption
for the decoding operation, making it feasible to integrate turbo
decoders into a low power handset. The techniques presented
include early termination of the turbo process, encoding of
extrinsic information to reduce the memory size, and disabling
portions of the MAP algorithm when the results will not affect the
decoded output. The net result of these techniques is almost a 70%
reduction in power over a fixed 6 iteration, 8-state baseline turbo
decoder at 2 dB of signal to noise ratio (SNR).
Keywords
Turbo coding, low power, early termination, extrinsics.
-
Low-Power AEC-Based MIMO Signal Processing for Gigabit
Ethernet 1000Base-T Transceivers [p. 334]
-
Lei Wang, Naresh R. Shanbhag (University of Illinois at Urbana-Champaign)
Presented in this paper is a low-power technique, denoted
as MIMO-AEC, to reduce energy dissipation in multi-input-multi-output
(MIMO) signal processing systems. The proposed
technique extends a previously proposed adaptive error
cancellation (AEC) technique to MIMO systems by employing
an algorithm transformation denoted as MIMO-DECOR.
The purpose of MIMO-DECOR is to reduce complexity by
exploiting correlations inherent in MIMO systems, thereby
improving the effectiveness of AEC. We employ the MIMO-AEC
in the design of a low-power Gigabit Ethernet 1000Base-
T device. Simulation results demonstrate 44:3% - 25.2%
overhead reduction due to MIMO-DECOR and 69.1% -
64.2% energy savings over conventional implementation with no
loss in algorithmic performance.
-
Power Reduction through Work Reuse [p. 340]
-
Emil Talpes, Diana Marculescu (Carnegie Mellon University)
Power consumption has become one of the big challenges in
designing high performance processors. The rapid increase in
complexity and speed that comes with each new CPU generation
causes greater problems with power consumption and heat
dissipation. Traditionally, these concerns are addressed through
semiconductor technology improvements such as voltage reduction
and technology scaling. This work proposes an alternative solution
to this problem, by dealing with the power consumption in the very
early stage of the microarchitecture design. More precisely, we show
that by modifying the well-established out-of-order, superscalar
processor architecture, significant gains can be achieved in terms of
power requirements without performance penalty. Our proposed
approach relies on reusing as much as possible from the work done
by the front-end of a typical pipelined, superscalar out-of-order via
the use of a cache nested deeply into the processor structure.
Experimental results show up to 52% (20% on average) savings in
average energy per committed instruction for two different pipeline
structures.
Session Chair: David Garrett (Lucent Technologies)
Session Organizer: Donald Steiss (Mindspring)
-
Clocking Strategies and Scannable Latches for Low Power Applications [p. 346]
-
V. Zyuban, D. Meltzer (IBM T. J. Watson Research Center)
This paper covers a range of issues in the design of clocking
schemes for low-power applications. First we revisit, extend
and improve the power-performance optimization methodology for
latches, attempting to make it more formal and comprehensive.
Data switching factor and the glitching activity are taken into
consideration, using a formal analytical approach, then a notion of
energy-efficient family of configurations is introduced to make
the comparison of different latch styles in the power-performance
space more fair, also the power of the clock distribution is taken into
account. Practical issues of building a low overhead scan mechanism
are considered, and the power overhead of the scannable design
is analyzed. A low-power LSSD extension to single-phase
latches is proposed, and results of a comparative study of
LSSD-scannable
latches are shown, supported by experimental data measured
on a 0:18u test chip.
-
Ultra-Low Power DLMS Adaptive Filter for Hearing Aid Applications [p. 352]
-
Hyung-il Kim, Kaushik Roy (Purdue University)
We present an ultra-low power DLMS (delayed least mean square)
adaptive filter working in the sub-threshold region for hearing aid
applications. Sub-threshold operation was accomplished by using
a parallel architecture with pseudo NMOS logic style. The parallel
architecture enabled us to run the system at a lower clock rate with
a reduced supply voltage, while maintaining the same throughput.
Pseudo NMOS logic operating in the sub-threshold region (Sub-Pseudo
NMOS) provided better power-delay product than subthreshold
CMOS (Sub-CMOS) logic. Simulation results show that
the system can process voice signals at a throughput of 22kHz
with a supply voltage of 400mV and achieve 91% improvement in
energy compared to the non-parallel architecture using standard
CMOS logic.
Keywords
DLMS adaptive filter, sub-threshold operation, parallel
architecture, Sub-Pseudo NMOS, Sub-CMOS
-
A Dynamic-SDRAM-Mode-Control Scheme for Low-Power Systems
with a 32-bit RISC CPU [p. 358]
-
Seiji Miura, Kazushige Ayukawa, Takao Watanabe (Hitachi, Ltd.)
We have developed a dynamic-SDRAM-mode-control scheme for
low-power systems with a 32-bit RISC CPU. The scheme is based
on two dynamic changes of SDRAM modes: from active standby
to standby and from standby to active standby. It reduces both the
operating current and the latency of an SDRAM. An analysis
using benchmark programs shows that the developed scheme
reduces the SDRAM operating current by 40% and latency by
38% compared to those of standby mode. An SDRAM controller
was developed based on this scheme and 0.18-um CMOS
technology. The area of the controller is 0.28mm2 and its
operating current is 2.5mA at 1.8V and 100 MHz.
Keywords:
SDRAM controller, standby mode, active-standby mode
-
Analysis and Implementation of Charge Recycling for Deep Sub-micron Buses [p. 364]
-
Paul P. Sotiriadis, Theodoros Konstantakopoulos, Anantha Chandrakasan
(Massachusetts Institute of Technology)
Charge recycling has been proposed as a strategy to
reduce the power dissipation in data buses. Previous work in this area
was based on simplified bus models that ignored the coupling
between the lines. Here we propose a new Charge Recycling Technique
(CRT) appropriate for sub-micron technologies. CRT is analyzed
mathematically using a bus energy model that captures the
energy loss due to strong line to line capacitive coupling. In theory
CRT can result to energy reduction of a factor of 2. It becomes even
more energy efficient when combined with Bus Invert coding (Stan Î97,
[6]). A circuit has been designed and simulated with all parasitic
elements extracted from the layout. Taking into account the circuit
energy overhead the net result in energy saving can be up to 32%.
Session Chair: Farid Najm (University of Toronto)
Session Organizer: Ed Huijbregts (Magma)
-
Estimation of Power Distribution in VLSI Interconnects [p. 370]
-
Youngsoo Shin, Takayasu Sakurai (University of Tokyo)
The analysis and simulation of effects induced by VLSI interconnects
become increasingly important as the scale of process technologies
steadily shrinks. While most analyses focus on the timing
aspects of interconnects, power consumption is also important.
In this paper, the power distribution estimation of interconnects is
studied using a reduced-order model. The relation between power
consumption and the poles and residues of a transfer function is
derived, and an appropriate driver model is developed, allowing
power consumption to be computed efficiently. Application of the
proposed method to RC networks is demonstrated using a prototype
tool.
-
Maximum Voltage Variation in the Power Distribution Network of VLSI
Circuits with RLC Models [p. 376]
-
Sudhakar Bobba (Sun Microsystems Inc.), Ibrahim N. Hajj (American University of Beirut)
In this paper, we present a frequency-domain technique to
estimate the worst-case time-domain voltage variation using
RLC models for the power distribution network. The proposed
method, unlike existing simulation-based techniques,
can handle frequency-dependent RLC parameters and generate
an upperbound on the maximum voltage drop over all
possible input excitations. Pattern independent maximum
envelope currents are used to estimate the upperbound on
the maximum magnitude of the frequency components for
the current waveform. These values are used to formulate
a nonlinear optimization problem for the maximum voltage
drop at nodes in the power distribution network. We then
present a method to solve the nonlinear optimization problem
using Lagrange multipliers. Comparisons with SPICE
simulations are presented to validate the techniques presented
in the paper.
-
Battery Capacity Measurement and Analysis using Lithium Coin Cell Battery [p. 382]
-
Sung Park, Andreas Savvides, Mani B. Srivastava (University of California, Los Angeles)
In this paper, we look at different battery capacity models that
have been introduced in the literatures. These models describe the
battery capacity utilization based on how the battery is discharged
by the circuits that consume power. In an attempt to validate these
models, we characterize a commercially available lithium coin cell
battery through careful measurements of the current and the
voltage output of the battery under different load profile applied
by a micro sensor node. In the result, we show how the capacity
of the battery is affected by the different load profile and provide
analysis on whether the conventional battery models are
applicable in the real world. One of the most significant finding
of our work will show that DC/DC converter plays a significant
role in determining the battery capacity, and that the true capacity
of the battery may only be found by careful measurements.
Keywords
Embedded System, Battery, Power Estimation, Energy
Estimation, DC/DC Converter, Coin Cell, Data Acquisition
-
On the Interaction of Power Distribution Network with Substrate [p. 388]
-
Rajendran Panda, Savithri Sundareswaran, David Blaauw (Motorola, Inc.)
In this paper, we investigate the interaction between a chipâs
power distribution network and its substrate to understand its impact
on power supply noise and substrate-coupled noise. The study is set
in the context of low-voltage, low-power, mixed signal chip designs
based on low resistance, epitaxial process, substrate technology. We
believe the findings of this study are significant to both the chip
integration engineer and the analog circuit designer. We attempt here to
answer two important questions: (1) To what extent can substrate
modify the power supply noise, and what parameters of substrate
design, if any, are salient? (2) What is the extent of coupling from
the noisy digital power supply to the analog circuits through the
substrate? We propose a method to simulate the power grid along
with the substrate and present findings of case studies conducted on
three low-power processor designs.
Keywords: substrate analysis, power grid analysis, substrate noise,
substrate coupled noise
|