A High-Throughput and Compact Hardware Implementation for the Reconstruction Loop in HEVC Intra Encoding

Yibo FAN  Leilei HUANG  Zheng XIE  Xiaoyang ZENG  

IEICE TRANSACTIONS on Electronics   Vol.E100-C   No.6   pp.643-654
Publication Date: 2017/06/01
Online ISSN: 1745-1353
DOI: 10.1587/transele.E100.C.643
Type of Manuscript: PAPER
Category: Integrated Electronics
reconstruction loop,  discrete cosine transform (DCT),  inverse discrete cosine transform (IDCT),  quantization (Q),  de-quantization (IQ),  high efficiency video coding (HEVC),  

Full Text: PDF>>
Buy this Article

In the newly finalized video coding standard, namely high efficiency video coding (HEVC), new notations like coding unit (CU), prediction unit (PU) and transformation unit (TU) are introduced to improve the coding performance. As a result, the reconstruction loop in intra encoding is heavily burdened to choose the best partitions or modes for them. In order to solve the bottleneck problems in cycle and hardware cost, this paper proposed a high-throughput and compact implementation for such a reconstruction loop. By “high-throughput”, it refers to that it has a fixed throughput of 32 pixel/cycle independent of the TU/PU size (except for 4×4 TUs). By “compact”, it refers to that it fully explores the reusability between discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) as well as that between quantization (Q) and de-quantization (IQ). Besides the contributions made in designing related hardware, this paper also provides a universal formula to analyze the cycle cost of the reconstruction loop and proposed a parallel-process scheme to further reduce the cycle cost. This design is verified on the Stratix IV FPGA. The basic structure achieved a maximum frequency of 150MHz and a hardware cost of 64K ALUTs, which could support the real time TU/PU partition decision for 4K×2K@20fps videos.