An Efficient Schema-Based Technique for Querying XML Data

Dao Dinh KHA

IEICE TRANSACTIONS on Information and Systems   Vol.E89-D    No.4    pp.1480-1489
Publication Date: 2006/04/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.4.1480
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Database
XML,  indexing,  querying,  schema,  numbering scheme,  

Full Text: PDF>>
Buy this Article

As data integration over the Web has become an increasing demand, there is a growing desire to use XML as a standard format for data exchange. For sharing their grammars efficiently, most of the XML documents in use are associated with a document structure description, such as DTD or XML schema. However, the document structure information is not utilized efficiently in previously proposed techniques of XML query processing. In this paper, we present a novel technique that reduces the disk I/O complexity of XML query processing. We design a schema-based numbering scheme called SPAR that incorporates both structure information and tag names extracted from DTD or XML schema. Based on SPAR, we develop a mechanism called VirtualJoin that significantly reduces disk I/O workload for processing XML queries. As shown by experiments, VirtualJoin outperforms many prior techniques.