Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding

Yu WANG  Jien KATO  

IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.4   pp.1212-1220
Publication Date: 2016/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7333
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
approximate sparse coding,  sparse coding,  approximate nearest neighbour,  local feature encoding,  action recognition,  

Full Text: PDF(615.1KB)>>
Buy this Article

Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the sparse codes for a large collection of prototype local feature descriptors in the off-line learning phase using Sparse Coding (SC) and look up the nearest prototype's precomputed sparse code for each to-be-encoded local feature in the encoding phase using Approximate Nearest Neighbour (ANN) search. It shares the low dimensionality of SC and the high speed of ANN, which are both desired properties for a local feature encoding approach. ASC has been excessively evaluated on the KTH dataset and the HMDB51 dataset. We confirmed that it is able to encode large quantity of local video features into discriminative low dimensional representations efficiently.