For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Finding Important People in a Video Using Deep Neural Networks with Conditional Random Fields
Mayu OTANI Atsushi NISHIDA Yuta NAKASHIMA Tomokazu SATO Naokazu YOKOYA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2018/10/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
neural network, conditional random field, important people classification,
Full Text: PDF(1.4MB)>>
Finding important regions is essential for applications, such as content-aware video compression and video retargeting to automatically crop a region in a video for small screens. Since people are one of main subjects when taking a video, some methods for finding important regions use a visual attention model based on face/pedestrian detection to incorporate the knowledge that people are important. However, such methods usually do not distinguish important people from passers-by and bystanders, which results in false positives. In this paper, we propose a deep neural network (DNN)-based method, which classifies a person into important or unimportant, given a video containing multiple people in a single frame and captured with a hand-held camera. Intuitively, important/unimportant labels are highly correlated given that corresponding people's spatial motions are similar. Based on this assumption, we propose to boost the performance of our important/unimportant classification by using conditional random fields (CRFs) built upon the DNN, which can be trained in an end-to-end manner. Our experimental results show that our method successfully classifies important people and the use of a DNN with CRFs improves the accuracy.