The BINDS-Tree: A Space-Partitioning Based Indexing Scheme for Box Queries in Non-Ordered Discrete Data Spaces

A. K. M. Tauhidul ISLAM  Sakti PRAMANIK  Qiang ZHU  

IEICE TRANSACTIONS on Information and Systems   Vol.E102-D   No.4   pp.745-758
Publication Date: 2019/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018DAP0005
Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)
non-ordered discrete data,  multidimensional indexing,  space-partitioning,  box query,  and algorithm,  

Full Text: PDF(1.1MB)>>
Buy this Article

In recent years we have witnessed an increasing demand to process queries on large datasets in Non-ordered Discrete Data Spaces (NDDS). In particular, one type of query in an NDDS, called box queries, is used in many emerging applications including error corrections in bioinformatics and network intrusion detection in cybersecurity. Effective indexing methods are necessary for efficiently processing queries on large datasets in disk. However, most existing NDDS indexing methods were not designed for box queries. Several recent indexing methods developed for box queries on a large NDDS dataset in disk are based on the popular data-partitioning approach. Unfortunately, a space-partitioning based indexing scheme, which is more effective for box queries in an NDDS, has not been studied before. In this paper, we propose a novel indexing method based on space-partitioning, called the BINDS-tree, for supporting efficient box queries on a large NDDS dataset in disk. A number of effective strategies such as node split based on minimum span and cross optimal balance, redundancy reduction utilizing a singleton dimension inheritance property, and a space-efficient structure for the split history are incorporated in the constructing algorithm for the BINDS-tree. Experimental results demonstrate that the proposed BINDS-tree significantly improves the box query I/O performance, comparing to that of the state-of-the-artdata-partitioning based NDDS indexing method.