For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Efficient Similarity Search with a Pivot-Based Complete Binary Tree
Yuki YAMAGISHI Kazuo AOYAMA Kazumi SAITO Tetsuo IKEDA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2017/10/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
algorithm, similarity search, index, tree, pivot,
Full Text: PDF(631.8KB)>>
This paper presents an efficient similarity search method utilizing as an index a complete binary tree (CBT) based on optimized pivots for a large-scale and high-dimensional data set. A similarity search method, in general, requires high-speed performance on both index construction off-line and similarity search itself online. To fulfill the requirement, we introduce novel techniques into an index construction and a similarity search algorithm in the proposed method for a range query. The index construction algorithm recursively employs the following two main functions, resulting in a CBT index. One is a pivot generation function that obtains one effective pivot at each node by efficiently maximizing a defined objective function. The other is a node bisection function that partitions a set of objects at a node into two almost equal-sized subsets based on the optimized pivot. The similarity search algorithm employs a three-stage process that narrows down candidate objects within a given range by pruning unnecessary branches and filtering objects in each stage. Experimental results on one million real image data set with high dimensionality demonstrate that the proposed method finds an exact solution for a range query at around one-quarter to half of the computational cost of one of the state-of-the-art methods, by using a CBT index constructed off-line at a reasonable computational cost.