Query Bootstrapping: A Visual Mining Based Query Expansion

Siriwat KASAMWATTANAROTE  Yusuke UCHIDA  Shin'ichi SATOH  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.2   pp.454-466
Publication Date: 2016/02/01
Publicized: 2015/11/10
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7193
Type of Manuscript: PAPER
Category: Image Processing and Video Processing
Keyword: 
image retrieval,  instance search,  query expansion,  frequent itemset mining,  visual word mining,  query bootstrapping,  adaptive support,  adaptive inlier threshold,  

Full Text: PDF>>
Buy this Article




Summary: 
Bag of Visual Words (BoVW) is an effective framework for image retrieval. Query expansion (QE) further boosts retrieval performance by refining a query with relevant visual words found from the geometric consistency check between the query image and highly ranked retrieved images obtained from the first round of retrieval. Since QE checks the pairwise consistency between query and highly ranked images, its performance may deteriorate when there are slight degradations in the query image. We propose Query Bootstrapping as a variant of QE to circumvent this problem by using the consistency of highly ranked images instead of pairwise consistency. In so doing, we regard frequently co-occurring visual words in highly ranked images as relevant visual words. Frequent itemset mining (FIM) is used to find such visual words efficiently. However, the FIM-based approach requires sensitive parameters to be fine-tuned, namely, support (min/max-support) and the number of top ranked images (top-k). Here, we propose an adaptive support algorithm that adaptively determines both the minimum support and maximum support by referring to the first round's retrieval list. Selecting relevant images by using a geometric consistency check further boosts retrieval performance by reducing outlier images from a mining process. An important parameter for the LO-RANSAC algorithm that is used for the geometric consistency check, namely, inlier threshold, is automatically determined by our algorithm. We further introduce tf-fi-idf on top of tf-idf in order to take into account the frequency of inliers (fi) in the retrieved images. We evaluated the performance of QB in terms of mean average precision (mAP) on three benchmark datasets and found that it gave significant performance boosts of 5.37%, 9.65%, and 8.52% over that of state-of-the-art QE on Oxford 5k, Oxford 105k, and Paris 6k, respectively.