VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence

Nga H. DO  Keiji YANAI  

IEICE TRANSACTIONS on Information and Systems   Vol.E98-D   No.1   pp.166-172
Publication Date: 2015/01/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014EDP7106
Type of Manuscript: PAPER
Category: Image Processing and Video Processing
shot ranking,  tag co-occurence,  visual features,  bipartite graph,  

Full Text: PDF(2.5MB)
>>Buy this Article

In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.