|
CODES 2002 ABSTRACTS
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
-
Codesign-Extended Applications [p. 1]
-
Brian Grattan, Greg Stitt, and Frank Vahid
We challenge the widespread assumption that an embedded system's
functionality can be captured in a single specification and then
partitioned among software and custom hardware processors. The
specification of some functions in software is very different
from the specification of the same function in hardware - too
different to conceive of automatically deriving one from the
other. We illustrate this concept using a digital camera example.
We introduce the idea of codesign-extended applications to deal
with the situation, wherein critical functions are written in
multiple versions, and integrated such that simple compiler/synthesis
flags instantiate a particular version along with the necessary control
and communication behavior. By capturing a specification as a
codesign-extended application, a designer enables smooth migration
among platforms with increasing amounts of on-chip configurable logic.
Keywords
Hardware/software partitioning, hardware/software cospecification,
configurable logic, system-on-a-chip, platform-based design.
-
Algorithmic Transformation Techniques for Efficient Exploration of Alternative
Application Instances [p. 7]
-
Todor Stefanov, Bart Kienhuis, and Ed Deprettere
Following the Y-chart paradigm for designing a system, an application
and an architecture are modeled separately and mapped onto
each other in an explicit design step. Next, a performance analysis
for alternative application instances, architecture instances and
mappings has to be done, thereby exploring the design space of
the target system. Deriving alternative application instances is not
trivially done. Nevertheless, many instances of a single application
exist that are worth to be derived for exploration. In this paper,
we present algorithmic transformation techniques for systematic
and fast generation of alternative application instances that express
task-level concurrency hidden in an application in some degree of
explicitness. These techniques help a system designer to speedup
significantly the design space exploration process.
Keywords
system-level design, design space exploration, application instances,
algorithmic transformations
-
Concurrent Execution Semantics and Sequential Simulation Algorithms for
Metropolis Meta-Model [p. 13]
-
Felice Balarin, Luciano Lavagno, Claudio Passerone,
Alberto Sangiovanni-Vincentelli, Yosinori Watanabe, and Guang Yang
This paper presents the simulation techniques that are available in
Metropolis, an inter-disciplinary research project that develops a
design methodology, supported by a comprehensive design environment
and tool set, for embedded systems. System behavior is
non-deterministic in general, especially in the beginning of the design
process, when several key decision, such as the mapping on
an implementation platform, have not yet been made, and thus the
traces obtainable by simulation are not unique even under the same
input sequence. One may want to visit as many traces as possible
for regression tests at the final stage of designs, or may just
need one valid trace for a quick validation of the design at an early
stage. Our techniques can adapt to these different objectives easily.
They are also platform-independent in that simulation using different
languages, such as SystemC 2.0, Java, and C++ with a thread
library, are possible. This feature is important for co-simulation
between designs captured in Metropolis and those that have been
already designed in other languages.
-
The Design Context of Concurrent Computation Systems [p. 19]
-
JoAnn M. Paul, Christopher M. Eatedali, and Donald E. Thomas
Design for performance-optimization of programmable, semicustom
SoCs requires the ability to model and optimize the
behavior of the system as a whole. Neither the hardware-testbench
style nor the software-benchmark style is adequate to capture
completely the design interactions required in concurrent software-on-hardware
systems. We use a formal relationship between a
computer system design content and its external context to motivate
the need to consider a more effective modeling framework to which
concurrent software-on-hardware computer systems are designed.
Keywords
Hardware/Software Codesign, Modeling, Simulation, Concurrent
Computation, Digital System Design
-
A Language for Multiple Models of Computation [p. 25]
-
Dag Björklund and Johan Lilius
We introduce a new kernel language for modeling hardware/software
systems, adopting multiple heterogeneous models of computation.
The language has formal operational semantics, and is well suited
for model checking, code synthesis etc. For different blocks of
code, different scheduling policies can be applied, to reflect the
different interpretations of e.g. parallelism in different models of
computation. The user can add his own scheduling policies, to use
or explore different models of computation.
-
FPGA Resource and Timing Estimation from Matlab Execution Traces [p. 31]
-
Per Bjuréus, Mikael Millberg, and Axel Jantsch
We present a simulation-based technique to estimate area and
latency of an FPGA implementation of a Matlab specification.
During simulation of the Matlab model, a trace is generated that
can be used for multiple estimations. For estimation the user provides
some design constraints such as the rate and bit width of
data streams. In our experience the runtime of the estimator is
approximately only 1/10 of the simulation time, which is typically
fast enough to generate dozens of estimates within a few hours
and to build cost-performance trade-off curves for a particular
algorithm and input data. In addition, the estimator reports on the
scheduling and resource binding used for estimation. This information
can be utilized not only to assess the estimation quality,
but also as first starting point for the final implementation.
Keywords
Design exploration, Matlab, FPGA, Estimation
-
Worst-Case Performance Analysis of Parallel, Communicating Software Processes
[p. 37]
-
A. Siebenborn, O. Bringmann, and W. Rosenstiel
In this paper we present a method to perform static timing
analysis of SystemC models, that describe parallel, communicating
software processes.The paper combines a worst-case
execution time (WCET) analysis with an analysis of
the communication behavior. The communication analysis
allows the detection of points, where the program ow of
two or more concurrent processes are synchronized. This
knowledge allows the determination of the worst-case response
time (WCRT). The method does not rely on restrictions
on the system design to prevent deadlocks or data loss.
Furthermore possible deadlocks and data loss can be detected
during the analysis.
-
Symbolic Model Checking of Dual Transition Petri Nets [p. 43]
-
Mauricio Varea, Bashir M. Al-Hashimi, Luis A. Cortés, Petru Eles,
and Zebo Peng
This paper describes the formal verification of the recently introduced
Dual Transition Petri Net (DTPN) models [12], using model
checking techniques. The methodology presented addresses the
symbolic model checking of embedded systems behavioural properties,
expressed in either computation tree logics (CTL) or linear
temporal logics (LTL). The embedded system specification is
given in terms of DTPN models, where elements of the model are
captured in a four-module library which implements the behaviour
of the model. Key issues in the development of the methodology
are the heterogeneity and the nondeterministic nature of the model.
This is handled by introducing some modifications in both structure
and behaviour of the model, thus reducing the points of nondeterminism.
Several features of the methodology are discussed and two
examples are given in order to show the validity of the model.
-
Simulation Bridge: A Framework for Multi-Processor Simulation [p. 49]
-
G.D. Nagendra, V.G. Prem Kumar, and B.S. Sheshadri Chakravarthy
Multi-processor solutions in the embedded world are being designed to meet
the ever increasing computational demands of the emerging applications.
Such architectures comprise two or more processors (often a mix of general
purpose and digital signal processors) together with a rich peripheral
mix to provide a high performance computational platform. While there are
many simulation solutions in the industry available to address the system
partitioning issues and also the verification of HW-SW interactions in these
complex systems, there are very few solutions targeted towards the SW
application developers' needs.
The primary concern of the SW application developers is to debug and
optimize their code. Hence, cycle accuracy and performance of the simulation
solution becomes the key enablers. Desired observability and controllability
of the models are additional careabouts. Secondly, application developers are
more comfortable at instruction level simulations than they are with RTL or
gate level simulation. These specific requirements have a bearing on the
choices in the simulation solutions.
This paper describes the design of a generic, C based multi-processor
instruction set simulator framework in the context of the above parameters.
This framework, termed the "simulation bridge", facilitates highly accurate,
yet efficient simulation. The SimBridge performs clock correct lock-step
simulation of the models in the architecture using a global simulation engine
that handles both intra-processor and inter-processor communication in a
homogeneous fashion. It addresses the multiple key issues of execution control,
synchronization, connectivity and communication.
The paper concludes with the performance analysis of the SimBridge in an
experimental test setup as well as in the Texas Instruments (TI)
TMS320C54x-based simulators.
Keywords:
Multiprocessor simulation, Instruction Set Simulator, Simulation Framework.
-
Metrics for Design Space Exploration of Heterogeneous Multiprocessor Embedded
Systems [p. 55]
-
Donatella Sciuto, Fabio Salice, Luigi Pomante, and William Fornaciari
This paper considers the problem of designing heterogeneous
multiprocessor embedded systems. The focus is on a step of the
design flow: the definition of innovative metrics for the analysis
of the system specification to statically identify the most suitable
processing elements class for each system functionality.
Experimental results are also included, to show the applicability
and effectiveness of the proposed methodology.
Keywords
Heterogeneous Multiprocessor Embedded Systems, Metrics for
Hw/Sw Partitioning, System-Level Design.
-
Fast Processor Core Selection for WLAN Modem Using Mappability Estimation
[p. 61]
-
Juha-Pekka Soininen, Jari Kreku, Yang Qu, and Martti Forsel
Mappability metric and a novel method for evaluating the
goodness of processor core and algorithm combinations are
introduced. The new mappability concept is an addition to
performance and cost metrics used in existing codesign and
system synthesis approaches. The mappability estimation is based
on the analysis of the correlation or similarity of algorithm and
core architecture characteristics. It allows fast design space
exploration of core architectures and mappings with little
modeling effort. The method is demonstrated by analyzing
suitable processor core architectures for baseband algorithms of
the WLAN modem. 140400 architecture-algorithm pairs were
analyzed in total and the estimated results were similar to the
results of more detailed evaluations. The method is not, however,
limited to the WLAN modem, but is applicable for digital signal
processing in general.
Keywords
mappability estimation, processor architecture evaluation,
codesign, cost function
-
Multi-Objective Design Space Exploration Using Genetic Algorithms [p. 67]
-
Tony Givargis and Maurizio Palesi
In this work, we provide a technique for efficiently exploring a parameterized
system-on-a-chip (SoC) architecture to find all Paretooptimal
configurations in a multi-objective design space. Globally,
our approach uses a parameter dependency model of our target parameterized
SoC architecture to extensively prune non-optimal subspaces.
Locally, our approach applies Genetic Algorithms (GAs) to
discover Pareto-optimal configurations within the remaining design
points. The computed Pareto-optimal configurations will represent
the range of performance (e.g., timing and power) tradeoffs that
are obtainable by adjusting parameter values for a fixed application
that is mapped on the parameterized SoC architecture. We have
successfully applied our technique to explore Pareto-optimal configurations
for a number of applications mapped on a parameterized
SoC architecture.
Keywords
Design space exploration, genetic algorithms, low power design,
Pareto-optimal configurations, and system-on-a-chip architectures
-
Scratchpad Memory : A Design Alternative for Cache On-Chip Memory in Embedded
Systems [p. 73]
-
Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter
Marwedel
In this paper we address the problem of on-chip memory selection for
computationally intensive applications, by proposing scratch pad memory as
an alternative to cache. Area and energy for different scratch pad and
cache sizes are computed using the CACTI tool while performance was
evaluated using the trace results of the simulator. The target processor
chosen for evaluation was AT91M40400. The results clearly establish scratchpad
memory as a low power alternative in most situations with an average
energy reduction of 40%. Further the average area-time reduction for
the scratchpad memory was 46% of the cache memory.
-
Hardware Support for Real-Time Embedded Multiprocessor System-on-a-Chip Memory
Management [p. 79]
-
Mohamed Shalan and Vincent J. Mooney III
The aggressive evolution of the semiconductor industry -
smaller process geometries, higher densities, and greater chip
complexity Ö has provided design engineers the means to create
complex, high-performance Systems-on-a-Chip (SoC) designs.
Such SoC designs typically have more than one processor and
huge memory, all on the same chip. Dealing with the global on-chip
memory allocation/de-allocation in a dynamic yet
deterministic way is an important issue for the upcoming billion
transistor multiprocessor SoC designs. To achieve this, we
propose a memory management hierarchy we call Two-Level
Memory Management. To implement this memory management
scheme - which presents a paradigm shift in the way designers
look at on-chip dynamic memory allocation - we present a
System-on-a-Chip Dynamic Memory Management Unit
(SoCDMMU) for allocation of the global on-chip memory, which
we refer to as Level Two memory management (Level One is the
operating system management of memory allocated to a particular
on-chip Processing Element). In this way, processing elements
(heterogeneous or non-heterogeneous hardware or software) in an
SoC can request and be granted portions of the global memory in
a fast and deterministic time (for an example of a four processing
element SoC, the dynamic memory allocation of the global onchip
memory takes sixteen cycles per allocation/deallocation in
the worst case). In this paper, we show how to modify an existing
Real-Time Operating System (RTOS) to support the new
proposed SoCDMMU. Our example shows a multiprocessor SoC
that utilizes the SoCDMMU has 440% overall speedup of the
application transition time over fully shared memory that does not
utilize the SoCDMMU.
Keywords
System-on-a-Chip, dynamic memory management, real-time
systems, embedded systems, SoCDMMU, two-level memory
management, Atalanta, real-time operating systems.
-
Large Exploration for HW/SW Partitioning of Multirate and Aperiodic Real-Time
Systems [p. 85]
-
Abdenour Azzedine, Jean-Philippe Diguet, and Jean-Luc Philippe
This paper addresses the domain of fine and coarse grain HW /
SW codesign for Real-Time System On-Chip. We propose a new
method for the real-time scheduling and the HW / SW partitioning
of multi-rate or aperiodic tasks. The large design space exploration
is based on parallelism/delay trade-off curves.
Keywords
HW / SW Codesign, system design exploration, RT scheduling
-
Program Slicing for Codesign [p. 91]
-
Jeffry T. Russell
Program slicing is a software analysis technique that computes the
set of operations in a program that may affect the computation at a
particular operation. Interprocedural slicing techniques have
separately addressed concurrent programs and hardware
description languages. However, application of slicing to
codesign of embedded systems requires dependence analysis
across the hardware-software interface.
We extend program slicing for a codesign environment.
Hardware-software interactions common in component-based
systems are mapped to previously introduced dependences,
including the interference and signal dependences. We introduce a
novel access dependence that models a memory access side effect
that results in activation of a process. A slicing algorithm that
incorporates this variety of dependences is described.
-
Compiler-Directed Customization of ASIP Cores [p. 97]
-
T. Vinod Kumar Gupta, Roberto E. Ko, and Rajeev Barua
This paper presents an automatic method to customize embedded
application-specific instruction processors (ASIPs) based on compiler
analysis. ASIPs, also known as embedded soft cores, allow
certain hardware parameters in the processor to be customized
for a specific application domain. They offer low design cost as
they use pre-designed and verified components. Our design goal is
choosing parameter values for fastest runtime within a given silicon
area budget for a particular application set. Present-day technologies
for choosing parameter values rely on exhaustive simulation
of the application set on all possible combinations of parameter
values š a time-consuming and non-scalable procedure.
We propose a compiler-based method that automatically derives
the optimal values of parameters without simulating any configuration.
Further, we expand the space of parameters that can be
changed from the limited set today, and evaluate the importance of
each. Results show that for our benchmarks, the runtimes for different
configurations are predicted with an average error of 2.5%.
In the two area constrained customization problem we evaluate,
our method is able to recommend the same configuration that is
recommended by brute force exhaustive simulation 1
Keywords: customization, embedded, soft cores, ASIP
-
A Study of CodePack: Optimizing Embedded Code Space [p. 103]
-
Avishay Orpaz and Shlomo Weiss
CodePack is a code compression system used by IBM in its PowerPC family
of embedded processors. CodePac combines high compression capability along
with fast and simple decoding hardware. IBM did not release much information
about the design of the system and the influence of various design parameters
on its performance. In our work we will present the system and its design
parameters and investigate how each affects its performance on the compression
rate and decoder complexity. We also present a novel efficient algorithm
to optimize the class structure of the system.
Keywords: Embedded Systems, CodePack, Code Compression, Optimization,
Embedded Software
-
A Novel Codesign Approach Based on Distributed Virtual Machines [p. 109]
-
Christian Kreiner, Christian Steger, Egon Teiniker, and Reinhold Weiss
This paper describes a hardware/software codesign approach for
the design of embedded systems based on digital signal processors
and FPGAs. Our approach is based on distributed virtual machines
for simulation and verification of the application on a Linux
cluster and for running the application on different target architectures
(DSPs, FPGAs) as well. The main focus is the description
of the virtual machine, which was designed to make DSP applications
portable across different platforms while maintaining optimal
code.
-
Optimization and Synthesis for Complex Reactive Embedded Systems by
Incremental Collapsing [p. 115]
-
Massimiliano L. Chiodo
We propose a software synthesis procedure for reactive real-time
embedded systems. In our approach, control parts of
the system are represented in a decomposed form enabling
more complex control structures to be represented. We propose
a synthesis procedure for this representation that incrementally
aggregates elements of the representation while
keeping the resulting code size under tight control. This
method combined with heuristic strategies works very well
on real-life designs and demonstrates the potential to produce
results that challenge or beat hand-written implementations.
Keywords
Software Synthesis, Finite-state Machines, Embedded Systems,
Real-time Systems
-
Transformation of SDL Specifications for System-Level Timing Analysis [p. 121]
-
Marek Jersak, Kai Richter, Rafik Henia, Rolf Ernst, and Frank Slomka
Complex embedded systems are typically specified using multiple
domain-specific languages. After code-generation, the implementation
is simulated and tested. Validation of non-functional properties,
in particular timing, remains a problem because full test coverage
cannot be achieved for realistic designs. The alternative, formal
timing analysis, requires a system representation based on key application
and architecture properties. These properties must first
be extracted from a system specification to enable analysis. In this
paper we present a suitable transformation of SDL specifications
for system-level timing analysis. We show ways to vary modeling
accuracy in order to apply available formal techniques. A practical
approach utilizing a recently developed system model is presented.
-
A Strongly Polynomial-Time Algorithm for Over-Constraint Resolution [p. 127]
-
Ali Dasdan
A system of binary linear constraints or difference constraints (SDC)
contains a set of variables that are constrained by a set of unary or
binary linear inequalities. In such diverse applications as scheduling,
interface timing verification, real-time systems, multimedia
systems, layout compaction, and constraint satisfaction, SDCs have
successfully been used to model systems of both temporal and spatial
constraints. Formally, SDCs are modeled by weighted, directed
(constraint) graphs. The consistency of an SDC means that there is
at least one instantiation of its variables that satisfies all its constraints.
It is well known that the absence of positive cycles in a
graph implies the consistency of the corresponding SDC, so the
consistency can be decided in strongly polynomial time. If a SDC
is found to be inconsistent, it has to be repaired to make it consistent.
This task is equivalent to removing positive cycles from
the corresponding graph. All the previous algorithms for this task
take time proportional to the number of positive cycles in the graph,
which can grow exponentially. In this paper, we propose a strongly
polynomial-time algorithm, i.e., an algorithm whose time complexity
is polynomial in the size of the graph. Our algorithm takes in a
graph and returns a list of edges and the changes in their weights to
remove all the positive cycles from the graph. We experimentally
quantify the length of the edge list and the running time of the algorithm
on large benchmark graphs. We show that both are very
small, so our algorithm is practical.
Keywords
Behavioral synthesis, high-level synthesis, scheduling, timing constraints,
rate analysis, constraint satisfaction.
-
Hardware-Software Cosynthesis of Multi-Mode Multi-Task Embedded Systems with
Real-Time Constraints [p. 133]
-
Hyunok Oh and Soonhoi Ha
An embedded system is called multi-mode when it supports
multiple applications by dynamically reconfiguring the system
functionality. This paper proposes a hardware-software
cosynthesis technique for multi-mode multi-task embedded
systems with real-time constraints. The cosynthesis problem
involves three subproblems: selection of appropriate processing
elements, mapping and scheduling of function modules to the
selected processing elements, and schedule analysis. The
proposed cosynthesis framework defines an iteration loop of three
steps that solve the subproblems separately. One of the key
benefits of such a modular approach is extensibility and
adaptability. Moreover, unlike the previous approaches, the
proposed technique considers task sharing between modes and
hardware sharing between tasks at the same time. We demonstrate
the usefulness of the proposed technique with a realistic multimode
embedded system that supports three modes of operation
with 5 different tasks.
Keywords
Hardware-software cosynthesis, multi-mode, multi-task
-
Design of Multi-Tasking Coprocessor Control for Eclipse [p. 139]
-
Martijn J. Rutten, Jos T.J. van Eijndhoven, and Evert-Jan D. Pol
Eclipse defines a heterogeneous multiprocessor architecture template
for data-dependent stream processing. Intended as a scalable
and flexible subsystem of forthcoming media-processing systems-on-a-chip,
Eclipse combines application configuration flexibility
with the efficiency of function-specific hardware, or coprocessors.
To facilitate reuse, Eclipse separates coprocessor functionality from
generic support that addresses multi-tasking, inter-task synchronization
, and data transport. Five interface primitives accomplish this separation.
The interface facilitates the design of coprocessors that require complex
control to handle data-dependent
I/O, saving/restoring task state upon task switches, and pipelined
processing. This paper presents how this interface enables the
design of such reusable yet cost-effective coprocessors.
-
Hardware-Software Bipartitioning for Dynamically Reconfigurable Systems [p. 145]
-
Daler N. Rakhmatov and Sarma B.K. Vrudhula
The main unique feature of dynamically reconfigurable systems
is the ability to time-share the same reconfigurable hardware resources.
However, the energy-delay cost associated with reconfiguration
must be accounted for during hardware-software partitioning.
We propose a method for mapping nodes of an application
control flow graph either to software or reconfigurable hardware,
explicitly targeting minimization of the energy-delay cost due to
both computation and configuration. The addressed problems are
energy-delay product minimization, delay-constrained energy minimization,
and energy-constrained delay minimization. We show
how these problems can be tackled by using network flow techniques,
after transforming the original control flow graph into an
equivalent network. If there are no constraints, as in the case of
the energy-delay product minimization, we are able to generate an
optimal solution in polynomial time.
Keywords: hardware-software partitioning, reconfigurable systems,
network flows.
-
HW/SW Partitioning and Code Generation of Embedded Control Applications on a
Reconfigurable Architecture Platform [p. 151]
-
Massimo Baleani, Frank Gennari, Yunjian Jiang, Yatish Patel,
Robert K. Brayton, and Alberto Sangiovanni-Vincentelli
This paper studies the use of a reconfigurable architecture platform
for embedded control applications aimed at improving real time
performance. The hw/sw codesign methodology from POLIS is
used. It starts from high-level specifications, optimizes an intermediate
model of computation (Extended Finite State Machines) and
derives both hardware and software, based on performance constraints.
We study a particular architecture platform, which consists
of a general purpose processor core, augmented with a reconfigurable
function unit and data-path to improve run time performance.
A new mapping flow and algorithms to partition hardware
and software are proposed to generate implementations that best
utilize this architecture. Encouraging preliminary results are shown
for automotive electronic control examples.
Keywords
CSoC, hw/sw co-design, code generation
-
Fast System-Level Power Profiling for Battery-Efficient System Design [p. 157]
-
Kanishka Lahiri, Anand Raghunathan, and Sujit Dey
An increasing disparity between the energy requirements of portable electronic
devices and available battery capacities is driving the development of
new design methodologies for battery-efficient systems. A crucial requirement
for battery efficient system design is to be able to efficiently and accurately
estimate battery life for candidate system architectures. Recently,
efficient techniques have been developed to estimate battery life under given
profiles of system power consumption over time. However, techniques for
generating the power profiles themselves are either too cumbersome for system
level exploration, or too inaccurate for battery life estimation.
In this paper, we present a new methodology for efficiently and accurately
generating power profiles for different system-level architectures.
The designer can specify the manner in which (i) system tasks are mapped
to a set of available implementations, and (ii) system communications are
mapped to a specified communication architecture. For a given architecture,
a power profile is automatically generated by analyzing an abstract representation
of the system execution traces, while taking into account the selected
implementations of the system‰s computations and communications.
Experiments conducted on the design of an IEEE 802.11 MAC processor
indicate that the power profiling approach offers run times that are several
orders of magnitude lower than a simulation based power profiling technique,
while sustaining negligible loss of accuracy (average profiling error
was observed to be less than 3.4%).
-
Energy Savings Through Compression in Embedded Java Environments [p. 163]
-
G. Chen, M. Kandemir, N. Vijaykrishnan, M.J. Irwin, and W. Wolf
Limited energy and memory resources are important constraints in
the design of an embedded system. Compression is an useful and
widely employed mechanism to reduce the memory requirements of
the system. As the leakage energy of a memory system increases
with its size and because of the increasing contribution of leakage to
overall system energy, compression also has a significant effect on
reducing energy consumption. However, storing compressed data /
instructions has a performance and energy overhead associated with
decompression at runtime. The underlying compression algorithm,
the corresponding implementation of the decompression and the ability
to reuse decompressed information critically impact this overhead.
In this paper, we explore the influence of compression on overall
memory energy using a commercial embedded Java virtual machine
(JVM) and a customized compression algorithm. Our results show
that compression is effective in reducing energy even when considering
the runtime decompression overheads for most applications.
Keywords
Compression, Leakage Energy, Embedded Java
-
Communication Speed Selection for Embedded Systems with Networked
Voltage-Scalable Processors [p. 169]
-
Jinfeng Liu, Pai H. Chou, and Nader Bagherzadeh
High-speed serial network interfaces are gaining wide use in connecting
multiple processors and peripherals in modern embedded
systems, thanks to their size advantage and power efficiency. Many
such interfaces also support multiple data rates, and this ability
is opening a new dimension in the power/performance trade-offs
between communication and computation on voltage scalable embedded
processors. To minimize energy consumption in these networked
architectures, designers must not only perform functional
partitioning but also carefully balance the speeds between communication
and computation, which compete for time and energy.
Minimizing communication power without considering computation
may actually lead to higher energy consumption at the system
level due to elongated on-time as well as lost opportunities for dynamic
voltage scaling on the processors. We propose a speed selection
methodology for globally optimizing the energy consumption
in embedded networked architectures. We formulate a multidimensional
optimization problem by modeling communication dependencies
between processors and their timing budgets. This enables
engineers to systematically solve the problem of optimal speed
selection for global energy reduction. We demonstrate the effectiveness
of our speed selection approach with an image processing
application mapped onto a multi-processor architecture with a
multi-speed Ethernet.
-
Pruning-Based Energy-Optimal Device Scheduling for Hard Real-Time Systems
[p. 175]
-
Vishnu Swaminathan and Krishnendu Chakrabarty
Dynamic Power Management (DPM) provides a simple, elegant and flexible
method for reducing energy consumption in embedded real-time systems. However,
I/O-centric DPM techniques have been studied largely for non-real-time
environments. We present an offline device scheduling technique for real-time
systems that generates an energy-optimal device schedule for a given task
set while guaranteeing that all real-time deadlines are met. Our method takes
as inputs a task set and a device-usage list for each task, and it schedules
the tasks such that the energy consumed by the set of I/O devices is
minimized. We compare our algorithm to an exhaustive enumeration method and
show that the proposed algorithm is very efficient in terms of memory usage
and computation time. We also present case studies to show that I/O-centric
DPM methods can result in significant energy savings.
-
Energy Frugal Tags in Reprogrammable I-Caches for Application-Specific Embedded
Processors [p. 181]
-
Peter Petrov and Alex Orailoglu
In this paper we present a software-directed customization methodology
for minimizing the energy dissipation in the instruction cache,
one of the most power consuming microarchitectural components
of high-end embedded processors. We target particularly the instruction
cache tag operations and show how an exceedingly small
number of tag bits, if any, are needed to compute the miss/hit behavior
for the most frequently executed application loops, thus minimizing
the energy needed to perform the tag reads and comparisons.
The proposed methodology exploits the fact that the code
layout structure of the program loops can be identified after compile
and link, and that it typically resides in a very confined memory
location, for which very few bits from the effective address can
be utilized as a tag. Subsequently, we present an efficient, programmable
implementation to apply the suggested energy minimization
technique. The experimental results show a significant
decrease in energy dissipation for a set of real-life applications.
-
Holistic Scheduling and Analysis of Mixed Time/Event-Triggered Distributed
Embedded Systems [p. 187]
-
Traian Pop, Petru Eles, and Zebo Peng
This paper deals with specific issues related to the design of
distributed embedded systems implemented with mixed, event-triggered
and time-triggered task sets, which communicate over bus
protocols consisting of both static and dynamic phases. Such
systems are emerging as the new standard for automotive
applications. We have developed a holistic timing analysis and
scheduling approach for this category of systems. We have also
identified several new design problems characteristic to such hybrid
systems. An example related to bus access optimization in the context
of a mixed static/dynamic bus protocol is presented. Experimental
results prove the efficiency of such an optimization approach.
-
Locality-Conscious Process Scheduling in Embedded Systems [p. 193]
-
I. Kadayif, M. Kandemir, I. Kolcu and G. Chen
In many embedded systems, existence of a data cache might influence
the effectiveness of process scheduling policy significantly.
Consequently, a scheduling policy that takes inter-process data reuse
into account might result in large performance benefits. In this paper,
we focus on array-intensive embedded applications and present
a locality-conscious scheduling strategy where we first evaluate the
potential data reuse between processes, and then, using the results
of this evaluation, select an order for process executions. We also
show how process codes can be transformed by an optimizing compiler
for increasing inter-process data reuse, thereby making locality-conscious
scheduling more effective. Our experimental results obtained
using two large, multi-process application codes indicate significant
runtime benefits.
-
Reconfigurable SoC Design with Hierarchical FSM and Synchronous Dataflow Model
[p. 199]
-
Sunghyun Lee, Sungjoo Yoo, and Kiyoung Choi
We present a method of runtime configuration scheduling in reconfigurable
SoC design. As a model of computation in system
representation, we use a popular formal model of computation, hierarchical
FSM (HFSM) with synchronous dataflow (SDF) model,
in short, HFSM-SDF model. In reconfigurable SoC design with
the HFSM-SDF model, the problem of configuration scheduling is
challenging due to the dynamic behavior of the system such as concurrent
execution of state transitions (by AND relation), complex
control flow (in the HFSM), and complex schedules of SDF actor
firing. Thus, compile-time static configuration scheduling may not
efficiently hide configuration latency.
To resolve the problem, it is necessary to know the exact order
of required configurations during runtime and to perform runtime
configuration scheduling. To obtain the exact order of configurations,
we exploit the inherent property of HFSM-SDF that the execution
order of SDF actors can be determined before the execution
of state transition of top FSM. After obtaining the order information
in a queue called ready configuration queue, we execute the
state transition. During the execution, whenever there is new available
FPGA resource, a new configuration is selected from the queue
and fetched by the runtime configuration scheduler. We applied the
method to an MPEG4 decoder design and obtained up to 21.8% improvement
in system runtime with a negligible overhead of runtime
(1.4%) and memory usage (0.94%).
-
Dynamic Run-Time HW/SW Scheduling Techniques for Reconfigurable Architectures [p. 205]
-
Juanjo Noguera and Rosa M. Badia
Dynamic run-time scheduling in System-on-Chip platforms has
become recently an active area of research because of the
performance and power requirements of new applications.
Moreover, dynamically reconfigurable logic (DRL) architectures
are an exciting alternative for embedded systems design.
However, all previous approaches to DRL multi-context
scheduling and HW/SW scheduling for DRL architectures are
based on static scheduling techniques. In this paper, we address
this problem and present: (1) a dynamic scheduler hardware
architecture, and (2) four dynamic run-time scheduling
algorithms for DRL-based multi-context platforms. The
scheduling algorithms have been integrated in our codesign
environment, where a large number of experiments have been
carried out. Results demonstrate the benefits of our approach.
Keywords
Dynamic run-time scheduling, reconfigurable architectures.
-
Extended Quasi-Static Scheduling for Formal Synthesis and Code Generation of
Embedded Software [p. 211]
-
Feng-Shi Su and Pao-Ann Hsiung
With the computerization of most daily-life amenities such as home
appliances, the software in a real-time embedded system now accounts
for as much as 70% of a system design. On one hand, this
increase in software has made embedded systems more accessible
and easy to use, while on the other hand, it has also necessitated
further research on how complex embedded software can be
designed automatically and correctly. Enhancing recent advances
in this research, we propose an Extended Quasi-Static Scheduling
(EQSS) method for formally synthesizing and automatically generating
code for embedded software, using the Complex-Choice Petri
Nets (CCPN) model. Our method improves on previous work in
three ways: (1) by removing model restrictions to cover a much
wider range of applications, (2) by proposing an extended algorithm
to schedule the more unrestricted model, and (3) by implementing
a code generator that can produce multi-threaded embedded
software programs. The requirements of an embedded software
are specified by a set of CCPN, which is scheduled using
EQSS such that the schedules satisfy limited embedded memory
requirements and task precedence constraints. Finally, a POSIXbased
multi-threaded embedded software program is generated in
the C programming language. Through an example, we illustrate
the feasibility and advantages of the proposed EQSS method.
|