Efficient Similarity Search with a Pivot-Based Complete Binary Tree

Yuki YAMAGISHI  Kazuo AOYAMA  Kazumi SAITO  Tetsuo IKEDA  

IEICE TRANSACTIONS on Information and Systems   Vol.E100-D   No.10   pp.2526-2536
Publication Date: 2017/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDP7100
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
algorithm,  similarity search,  index,  tree,  pivot,  

Full Text: PDF(631.8KB)>>
Buy this Article

This paper presents an efficient similarity search method utilizing as an index a complete binary tree (CBT) based on optimized pivots for a large-scale and high-dimensional data set. A similarity search method, in general, requires high-speed performance on both index construction off-line and similarity search itself online. To fulfill the requirement, we introduce novel techniques into an index construction and a similarity search algorithm in the proposed method for a range query. The index construction algorithm recursively employs the following two main functions, resulting in a CBT index. One is a pivot generation function that obtains one effective pivot at each node by efficiently maximizing a defined objective function. The other is a node bisection function that partitions a set of objects at a node into two almost equal-sized subsets based on the optimized pivot. The similarity search algorithm employs a three-stage process that narrows down candidate objects within a given range by pruning unnecessary branches and filtering objects in each stage. Experimental results on one million real image data set with high dimensionality demonstrate that the proposed method finds an exact solution for a range query at around one-quarter to half of the computational cost of one of the state-of-the-art methods, by using a CBT index constructed off-line at a reasonable computational cost.