A Design of High Performance Parallel Architecture and Communication for Multi-ASIP Based Image Processing Engine

Hsuan-Chun LIAO  Mochamad ASRI  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E96-A   No.6   pp.1222-1235
Publication Date: 2013/06/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E96.A.1222
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Circuit, System, and Computer Technologies)
ASIP,  image processing,  

Full Text: PDF(2.5MB)
>>Buy this Article

Image processing engine is crucial for generating high quality images in video system. As Application Specific Integrated Circuit (ASIC) is dedicated for specific standards, Application Specific Instruction-set Processor (ASIP) which provides high flexibility and high performance seems to have more advantages in supporting the nonstandard pre/post image processing in video system. In our previous work, we have designed some ASIPs that can perform several image processing algorithms with a reconfigurable datapath. ASIP is as efficient as DSP, but its area is considerably smaller than DSP. As the resolution of image and the complexity of processing increase, the performance requirement also increases accordingly. In this paper, we presents a novel multi ASIP based image processing unit (IPU) which can provide sufficient performance for the emerging very-high-resolution applications. In order to provide a high performance image processing engine, we propose several new techniques and architecture such as multi block-pipes architecture, pixel direct transmission and boundary pixel write-through. Multi block-pipes architecture has flexible scalability in supporting a various ranges of resolution, which ranges from low resolution to high resolution. The boundary pixel write-through technique provides high efficient parallel processing, and pixel direct transmission technique is implemented in each ASIP to further reduce the data transmission time. Cycle-accurate SystemC simulations are performed, and the experimental results show that the maximum bandwidth of the proposed communication approach can achieve up to 1580 Mbyte/s at 400 MHz. Moreover, communication overhead can be reduced about a maximum of 88% compared to our previous works.