Two-Phased Bulk Insertion by Seeded Clustering for R-Trees

Taewon LEE  Sukho LEE  

IEICE TRANSACTIONS on Information and Systems   Vol.E89-D    No.1    pp.228-236
Publication Date: 2006/01/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.1.228
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Database
R-tree,  bulk insertion,  seeded clustering,  

Full Text: PDF>>
Buy this Article

With great advances in the mobile technology and wireless communications, users expect to be online anytime anywhere. However, due to the high cost of being online, applications are still implemented as partially connected to the server. In many data-intensive mobile client/server frameworks, it is a daunting task to archive and index such a mass volume of complex data that are continuously added to the server when each mobile client gets online. In this paper, we propose a scalable technique called Seeded Clustering that allows us to maintain R-tree indexes by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an R-tree for each of the clusters and insert the input R-trees into the target R-tree in bulk one at a time. We present detailed algorithms for the seeded clustering and bulk insertion as well as the results from our extensive experimental study. The experimental results show that the bulk insertion by seeded clustering outperforms the previously known methods in terms of insertion cost and the quality of target R-trees measured by their query performance.