Full-HD 60fps FPGA Implementation of Spatio-Temporal Keypoint Extraction Based on Gradient Histogram and Parallelization of Keypoint Connectivity

Takahiro SUZUKI  Takeshi IKENAGA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E99-A   No.11   pp.1937-1946
Publication Date: 2016/11/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E99.A.1937
Type of Manuscript: Special Section PAPER (Special Section on Smart Multimedia & Communication Systems)
Category: Vision
hardware architecture,  keypoint extraction,  SIFT,  cloud,  video recognition,  

Full Text: PDF>>
Buy this Article

Recently, cloud systems have started to be utilized for services which analyze user's data in the field of computer vision. In these services, keypoints are extracted from images or videos, and the data is identified by machine learning with a large database in the cloud. To reduce the number of keypoints which are sent to the cloud, Keypoints of Interest (KOI) extraction has been proposed. However, since its computational complexity is large, hardware implementation is required for real-time processing. Moreover, the hardware resource must be low because it is embedded in devices of users. This paper proposes a hardware-friendly KOI algorithm with low amount of computations and its real-time hardware implementation based on dual threshold keypoint detection by gradient histogram and parallelization of connectivity of adjacent keypoint-utilizing register counters. The algorithm utilizes dual-histogram based detection and keypoint-matching based calculation of motion information and dense-clustering based keypoint smoothing. The hardware architecture is composed of a detection module utilizing descriptor, and grid-region-parallelization based density clustering. Finally, the evaluation results of hardware implementation show that the implemented hardware achieves Full-HD (1920x1080)-60 fps spatio-temporal keypoint extraction. Further, it is 47 times faster than low complexity keypoint extraction on software and 12 times faster than spatio-temporal keypoint extraction on software, and the hardware resources are almost the same as SIFT hardware implementation, maintaining accuracy.