Architecture and Circuit Optimization of Hardwired Integer Motion Estimation Engine for H.264/AVC

Zhenyu LIU  Dongsheng WANG  Takeshi IKENAGA  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E93-A   No.11   pp.2065-2073
Publication Date: 2010/11/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E93.A.2065
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Information Theory and Its Applications)
Category: Image Processing
Keyword: 
H.264/AVC,  variable block size motion estimation,  hardwired engine,  very large scale integration (VLSI),  

Full Text: PDF>>
Buy this Article




Summary: 
Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8 MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 µm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8 MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.