Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System

Tongsheng GENG  Leibo LIU  Shouyi YIN  Min ZHU  Shaojun WEI  

IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.12   pp.3223-3231
Publication Date: 2010/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.3223
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)
H.264,  reconfigurable multimedia system,  parallelization computation,  hardware/software partition,  

Full Text: PDF>>
Buy this Article

This paper proposes approaches to perform HW/SW (Hardware/Software) partition and parallelization of computing-intensive tasks of the H.264 HiP (High Profile) decoding algorithm on an embedded coarse-grained reconfigurable multimedia system, called REMUS (REconfigurable MUltimedia System). Several techniques, such as MB (Macro-Block) based parallelization, unfixed sub-block operation etc., are utilized to speed up the decoding process, satisfying the requirements of real-time and high quality H.264 applications. Tests show that the execution performance of MC (Motion Compensation), deblocking, and IDCT-IQ (Inverse Discrete Cosine Transform-Inverse Quantization) on REMUS is improved by 60%, 73%, 88.5% in the typical case and 60%, 69%, 88.5% in the worst case, respectively compared with that on XPP PACT (a commercial reconfigurable processor). Compared with ASIC solutions, the performance of MC is improved by 70%, 74% in the typical and in the worst case, respectively, while those of Deblocking remain the same. As for IDCT_IQ, the performance is improved by 17% no matter in the typical or worst case. Relying on the proposed techniques, 1080p@30 fps of H.264 HiP@ Level 4 decoding could be achieved on REMUS when utilizing a 200 MHz working frequency.