Memory-Efficient and High-Performance Two-Dimensional Discrete Wavelet Transform Architecture Based on Decomposed Lifting Algorithm

Peng CAO  Chao WANG  Longxing SHI  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E92-A   No.8   pp.2000-2008
Publication Date: 2009/08/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E92.A.2000
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: Digital Signal Processing
discrete wavelet transform (DWT),  decomposed lifting algorithm (DLA),  line-based,  

Full Text: PDF>>
Buy this Article

The line-based method has been one of the most commonly-used methods of hardware implementation of two-dimensional (2D) discrete wavelet transform (DWT). However, data buffer is required between the row DWT processor and the column DWT processor to solve the data flow mismatch, which increases the on-chip memory size and the output latency. Since the incompatible data flow is induced from the intrinsic property of adopted lifting-based algorithm, a decomposed lifting algorithm (DLA) is presented by rearranging the data path of lifting steps to ensure that image data is processed in raster scan manner in row processor and column processor. Theoretical analysis indicates that the precision issue of DLA outperforms other lifting-based algorithms in terms of round-off noise and internal word-length. A memory-efficient and high-performance line-based architecture is proposed based on DLA without the implementation of data buffer. For an N M image, only 2N internal memory is required for 5/3 filter and 4N of that is required for 9/7 filter to perform 2D DWT, where N and M indicate the width and height of an image. Compared with related 2D DWT architectures, the size of on-chip memory is reduced significantly under the same arithmetic cost, memory bandwidth and timing constraint. This design was implemented in SMIC 0.18 µm CMOS logic fabrication with 32 kbits dual-port RAM and 20 K equivalent 2-input NAND gates in a 1.00 mm 1.00 mm die, which can process 512 512 image under 100 MHz.