A Low-Power Real-Time SIFT Descriptor Generation Engine for Full-HDTV Video Recognition

Kosuke MIZUNO  Hiroki NOGUCHI  Guangji HE  Yosuke TERACHI  Tetsuya KAMINO  Tsuyoshi FUJINAGA  Shintaro IZUMI  Yasuo ARIKI  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

IEICE TRANSACTIONS on Electronics   Vol.E94-C   No.4   pp.448-457
Publication Date: 2011/04/01
Online ISSN: 1745-1353
DOI: 10.1587/transele.E94.C.448
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on Circuits and Design Techniques for Advanced Large Scale Integration)
SIFT,  image recognition,  low-power,  HDTV,  

Full Text: PDF(3.2MB)>>
Buy this Article

This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 4.2 mm2 containing 1.1 M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.