Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

Takeshi KUMAKI  Masakatsu ISHIZAKI  Tetsushi KOIDE  Hans Jurgen MATTAUSCH  Yasuto KURODA  Hideyuki NODA  Katsumi DOSAKA  Kazutami ARIMOTO  Kazunori SAITO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E90-D   No.8   pp.1312-1315
Publication Date: 2007/08/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.8.1312
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Image Processing and Video Processing
Keyword: 
DCT,  fast DCT,  matrix-processing engine,  SIMD,  bit-serial and word-parallel,  

Full Text: PDF(485KB)>>
Buy this Article




Summary: 
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.