An Efficient Pattern Matching Algorithm for Ordered Term Tree Patterns

Yusuke SUZUKI  Takayoshi SHOUDAI  Tomoyuki UCHIDA  Tetsuhiro MIYAHARA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E98-A   No.6   pp.1197-1211
Publication Date: 2015/06/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E98.A.1197
Type of Manuscript: Special Section PAPER (Special Section on Discrete Mathematics and Its Applications)
ordered tree structured pattern,  graph pattern matching algorithm,  polynomial time algorithm,  NP-completeness,  

Full Text: PDF(2MB)>>
Buy this Article

A term tree pattern is a rooted ordered tree pattern which consists of ordered tree structures with edge labels and structured variables with labels. All variables with the same label in a term tree pattern can be simultaneously replaced with ordered trees isomorphic to the same rooted ordered tree. Then, a term tree pattern is suitable for representing structural features common to tree structured data such as XML documents on the web, the secondary structures of RNA in biology and parse trees describing grammatical structures of natural languages. Let $ott$ be the set of all term tree patterns which have one or more variables with the same label. Let $lott$ be the set of all term tree patterns t such that all variables in t have distinct labels. We remark that $lottsubsetneq ott$ holds. In this paper, we consider a problem, called Matching problem for term tree patterns, of deciding whether or not a given rooted ordered tree T is obtained from a given term tree pattern t by replacing variables in t with rooted ordered trees. We show that Matching problem for term tree patterns in $ott$ is NP-complete, by giving a reduction from the string pattern matching problem, which is NP-complete. Next, by giving operations on an interval, which is a set containing all integers between two given integers representing vertex identifiers, we propose an efficient algorithm for solving Matching problem for term tree patterns in $lottsubsetneq ott$. Then, we show that, when an ordered tree having N vertices and a term tree pattern $t in lott$ having n vertices are given, the proposed matching algorithm correctly solves this problem in O(nN) time.