For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
VisualTextualRank: An Extension of VisualRank to Large-Scale Video Shot Extraction Exploiting Tag Co-occurrence
Nga H. DO Keiji YANAI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2015/01/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Image Processing and Video Processing
shot ranking, tag co-occurence, visual features, bipartite graph,
Full Text: PDF(2.5MB)
>>Buy this Article
In this paper, we propose a novel ranking method called VisualTextualRank which ranks media data according to the relevance between the data and specified keywords. We apply our method to the system of video shot ranking which aims to automatically obtain video shots corresponding to given action keywords from Web videos. The keywords can be any type of action such as “surfing wave” (sport action) or “brushing teeth” (daily activity). Top ranked video shots are expected to be relevant to the keywords. While our baseline exploits only visual features of the data, the proposed method employs both textual information (tags) and visual features. Our method is based on random walks over a bipartite graph to integrate visual information of video shots and tag information of Web videos effectively. Note that instead of treating the textual information as an additional feature for shot ranking, we explore the mutual reinforcement between shots and textual information of their corresponding videos to improve shot ranking. We validated our framework on a database which was used by the baseline. Experiments showed that our proposed ranking method, VisualTextualRank, improved significantly the performance of the system of video shot extraction over the baseline.