Contextual Max Pooling for Human Action Recognition

Zhong ZHANG  Shuang LIU  Xing MEI  

IEICE TRANSACTIONS on Information and Systems   Vol.E98-D   No.4   pp.989-993
Publication Date: 2015/04/01
Publicized: 2015/01/19
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014EDL8221
Type of Manuscript: LETTER
Category: Image Recognition, Computer Vision
contextual max pooling,  human action recognition,  spatio-temporal relationship,  

Full Text: PDF(207.9KB)>>
Buy this Article

The bag-of-words model (BOW) has been extensively adopted by recent human action recognition methods. The pooling operation, which aggregates local descriptor encodings into a single representation, is a key determiner of the performance of the BOW-based methods. However, the spatio-temporal relationship among interest points has rarely been considered in the pooling step, which results in the imprecise representation of human actions. In this paper, we propose a novel pooling strategy named contextual max pooling (CMP) to overcome this limitation. We add a constraint term into the objective function under the framework of max pooling, which forces the weights of interest points to be consistent with their probabilities. In this way, CMP explicitly considers the spatio-temporal contextual relationships among interest points and inherits the positive properties of max pooling. Our method is verified on three challenging datasets (KTH, UCF Sports and UCF Films datasets), and the results demonstrate that our method achieves better results than the state-of-the-art methods in human action recognition.