Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor

Takeshi KUMAKI  Masakatsu ISHIZAKI  Tetsushi KOIDE  Hans Jurgen MATTAUSCH  Yasuto KURODA  Hideyuki NODA  Katsumi DOSAKA  Kazutami ARIMOTO  Kazunori SAITO  

IEICE TRANSACTIONS on Information and Systems   Vol.E90-D    No.8    pp.1312-1315
Publication Date: 2007/08/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.8.1312
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Image Processing and Video Processing
DCT,  fast DCT,  matrix-processing engine,  SIMD,  bit-serial and word-parallel,  

Full Text: PDF>>
Buy this Article

This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.