For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Contextual Max Pooling for Human Action Recognition
Zhong ZHANG Shuang LIU Xing MEI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2015/04/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Image Recognition, Computer Vision
contextual max pooling, human action recognition, spatio-temporal relationship,
Full Text: PDF(207.9KB)
>>Buy this Article
The bag-of-words model (BOW) has been extensively adopted by recent human action recognition methods. The pooling operation, which aggregates local descriptor encodings into a single representation, is a key determiner of the performance of the BOW-based methods. However, the spatio-temporal relationship among interest points has rarely been considered in the pooling step, which results in the imprecise representation of human actions. In this paper, we propose a novel pooling strategy named contextual max pooling (CMP) to overcome this limitation. We add a constraint term into the objective function under the framework of max pooling, which forces the weights of interest points to be consistent with their probabilities. In this way, CMP explicitly considers the spatio-temporal contextual relationships among interest points and inherits the positive properties of max pooling. Our method is verified on three challenging datasets (KTH, UCF Sports and UCF Films datasets), and the results demonstrate that our method achieves better results than the state-of-the-art methods in human action recognition.