A Tightly Coupled General Purpose Reconfigurable Accelerator LAPP and Its Power States for HotSpot-Based Energy Reduction

Jun YAO  Yasuhiko NAKASHIMA  Naveen DEVISETTI  Kazuhiro YOSHIMURA  Takashi NAKADA  

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.12   pp.3092-3100
Publication Date: 2014/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014PAP0025
Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)
Category: Architecture
reconfigurable architectures,  multi-core processing,  energy efficiency,  

Full Text: PDF>>
Buy this Article

General purpose many-core architecture (MCA) such as GPGPU has recently been used widely to continue the performance scaling when the continuous increase in the working frequency has approached the manufacturing limitation. However, both the general purpose MCA and its building block general purpose processor (GPP) lack a tuning capability to boost energy efficiency for individual applications, especially computation intensive applications. As an alternative to the above MCA platforms, we propose in this paper our LAPP (Linear Array Pipeline) architecture, which takes a special-purpose reconfigurable structure for an optimal MIPS/W. However, we also keep the backward binary compatibility, which is not featured in most special hardware. More specifically, we used a general purpose VLIW processor, interpreting a commercial VLIW ISA, as the baseline frontend part to provide the backward binary compatibility. We also extended the functional unit (FU) stage into an FU array to form the reconfigurable backend for efficient execution of program hotspots to exploit parallelism. The hardware modules in this general purpose reconfigurable architecture have been locally zoned into several groups to apply preferable low-power techniques according to the module hardware features. Our results show that under a comparable performance, the tightly coupled general/special purpose hardware, which is based on a 180nm cell library, can achieve 10.8 times the MIPS/W of MCA architecture of the same technology features. When a 65 technology node is assumed, a similar 9.4x MIPS/W can be achieved by using the LAPP without changing program binaries.