Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

Xinning LIU  Chen MEI  Peng CAO  Min ZHU  Longxing SHI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.2   pp.374-382
Publication Date: 2012/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.374
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Reconfigurable Systems)
Category: Design Methodology
Keyword: 
REMUS-II,  coarse grain reconfigurable architecture,  multimedia application,  data flow optimization,  H.264 HiP,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm2 silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.