
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Complexity and a Method of Extracting a Database Schema over Semistructured Documents
Nobutaka SUZUKI Yoichirou SATO Michiyoshi HAYASE
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E85D
No.6
pp.940949 Publication Date: 2002/06/01
Online ISSN:
DOI:
Print ISSN: 09168532 Type of Manuscript: PAPER Category: Databases Keyword: semistructured data, schema extraction problem, approximability, NPhardness,
Full Text: PDF(628.3KB)>>
Summary:
Semistructured data comprises irregular structure and has no apriori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NPcomplete, and show that another version of the problem is strongly NPhard and belongs to Δ_{2} P. Then we show that for any r < 3/2, there is no polynomialtime rapproximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomialtime algorithm for constructing a database schema by using bounded classes.

