For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Acceleration of DCT Processing with Massive-Parallel Memory-Embedded SIMD Matrix Processor
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2007/08/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Image Processing and Video Processing
DCT, fast DCT, matrix-processing engine, SIMD, bit-serial and word-parallel,
Full Text: PDF(485KB)>>
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.