The 11th Asia and South Pacific Design Automation Conference Technical Program

The 11th Asia and South Pacific Design Automation Conference

Friday January 27, 2006

Session 8B Memory Optimization for Embedded Systems (13:30 - 15:35)
Location: Room 413
Chair(s): Hiroyuki Tomiyama (Nagoya University, Japan), Preeti Ranjan Panda (Indian Inst. of Tech., Delhi, India)

8B-1 (Time: 13:30 - 13:55)

Title	Finding Optimal L1 Cache Configuration for Embedded Systems
Author	Andhi Janapsatya, Aleksandar Ignjatovic, *Sri Parameswaran (The University of New South Wales, Australia)
Page	pp. 796 - 801
Keyword	Design Space Exploration, Embedded System, Cache Memory
Abstract	Modern embedded system execute a single application or a class of applications repeatedly. A new emerging methodology of designing embedded system utilizes configurable processors where the cache size, associativity, and line size can be chosen by the designer. In this paper, a method is given to rapidly find the L1 cache miss rate of an application. An energy model and an execution time model are developed to find the best cache configuration for the given embedded application. Using benchmarks from Mediabench, we find that our method is on average 45 times faster to explore the design space, compared to Dinero IV while still having 100% accuracy.
PDF file

8B-2 (Time: 13:55 - 14:20)

Title	Memory Size Computation for Multimedia Processing Applications
Author	Hongwei Zhu, Ilie I. Luican, *Florin Balasa (University of Illinois at Chicago, United States)
Page	pp. 802 - 807
Keyword	memory, multimedia, signal processing, multidimensional signals, polytopes
Abstract	In real-time multimedia processing systems a large part of the power consumption is due to the data storage and data transfer. Moreover, the area cost is often largely dominated by the memory modules. The computation of the memory size is an important step in the process of designing an optimized (for area and/or power) memory architecture for multimedia processing systems. This paper presents a novel non-scalar approach for computing exactly the memory size in real-time multimedia algorithms. This methodology uses both algebraic techniques specific to the data-flow analysis used in modern compilers, and also recent advances in the theory of integral polyhedra. In contrast with all the previous works which are only estimation methods, this approach performs exact memory computations even for applications with a large number of scalar signals.
PDF file

8B-3 (Time: 14:20 - 14:45)

Title	Maximizing Data Reuse for Minimizing Memory Space Requirements and Execution Cycles
Author	Mahmut Kandemir, Guangyu Chen, *Feihui Li (Pennsylvania State University, United States)
Page	pp. 808 - 813
Keyword	data locality
Abstract	Embedded systems in the form of vehicles and mobile devices such as wireless phones, automatic banking machines and new multi-modal devices operate under tight memory and power constraints. Therefore, their performance demands must be balanced very well against their memory space requirements and power consumption. Automatic tools that can optimize for memory space utilization and performance are expected to be increasingly important in the future as increasingly larger portions of embedded designs are being implemented in software. In this paper, we describe a novel optimization framework that can be used in two different ways: (i) deciding a suitable on-chip memory capacity for a given code, and (ii) restructuring the application code to make better use of the available on-chip memory space. While prior proposals have addressed these two questions, the solutions proposed in this paper are very aggressive in extracting and exploiting all data reuse in the application code, restricted only by inherent data dependences.
PDF file

8B-4 (Time: 14:45 - 15:10)

Title	Compiler-Guided Data Compression for Reducing Memory Consumption of Embedded Applications
Author	Ozcan Ozturk, Guangyu Chen, *Mahmut Kandemir (Pennsylvania State University, United States), Ibrahim Kolcu (University of Manchester, Great Britain)
Page	pp. 814 - 819
Keyword	Scratchpad Memory, Memory Compression, Compiler
Abstract	Memory system presents one of the critical challenges on embedded system design and optimization. This is mainly due to ever-increasing code complexity of embedded applications and exponential increase witnessed in the amount of data they manipulate. As a result, reducing memory space occupancy of embedded applications is very important and will be even more important in the next decade. Motivated by this observation, this paper presents and evaluates a compiler-driven approach to data compression for reducing memory space occupancy. Our goal in this paper is to study how automated compiler support can help in deciding the set of data elements to compress/decompress and the points during execution at which these compressions/decompressions should be performed. The proposed compiler support achieves this by analyzing the source code of the application to be optimized and identifying the order in which the different data blocks are accessed. Based on this analysis, the compiler then automatically inserts compression/decompression calls in the application code. The compression calls target the data blocks that are not expected to be used in the near future, whereas the decompression calls target those data blocks with expected reuse but currently in compressed form.
PDF file

8B-5 (Time: 15:10 - 15:35)

Title	Analysis of Scratch-Pad and Data-Cache Performance Using Statistical Methods
Author	*Javed Absar (IMEC, Katholieke Universiteit Leuven, Belgium), Francky Catthoor (IMEC, Belgium)
Page	pp. 820 - 825
Keyword	scratch-pad, performance, measure, probability, hit-rate
Abstract	An effectively designed and efficiently used memory hierarchy, composed of scratch-pads or cache, is seen today as the key to obtaining energy and performance gains in data-dominated embedded applications. However, an unsovled problem is - how to make the right choice between the scratch-pad and the data-cache for different class of applications. Recent studies show that applications with regular and manifest data access patterns (e.g. matrix multiplication) perform better on the scratch-pad compared to the cache. In the case of dynamic applications with irregular and non-manifest access patterns, it is however commonly and intuitively believed that the cache would perform better. In this paper, we show by theoretical analysis and experimental results that this intuition can sometimes be misleading. When access-probabilities remain fixed, we prove that the scratch-pad, with an optimal mapping, will always outperform the cache. We also demonstrate how to map dynamic applications efficiently to scratch-pad or cache and additionally, how to accurately predict the performance.
PDF file