Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

Shuang BAI  Tetsuya MATSUMOTO  Yoshinori TAKEUCHI  Hiroaki KUDO  Noboru OHNISHI  

IEICE TRANSACTIONS on Information and Systems   Vol.E95-D    No.12    pp.3060-3068
Publication Date: 2012/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.3060
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Image Recognition, Computer Vision
object categorization,  bag of visual words,  contextual information,  hierarchical codebook,  

Full Text: PDF>>
Buy this Article

Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.