Efficient Regular Path Query Evaluation by Splitting with Unit-Subquery Cost Matrix

Van-Quyet NGUYEN  Kyungbaek KIM  

IEICE TRANSACTIONS on Information and Systems   Vol.E100-D   No.10   pp.2648-2652
Publication Date: 2017/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDL8060
Type of Manuscript: LETTER
Category: Data Engineering, Web Information Systems
regular path queries,  large graphs,  graph querying,  

Full Text: PDF>>
Buy this Article

A widely-used query on a graph is a regular path query (RPQ) whose answer is a set of tuples of nodes connected by paths corresponding to a given regular expression. Traditionally, evaluating an RPQ on a large graph takes substantial memory spaces and long response time. Recently, several studies have focused on improving response time for evaluating an RPQ by splitting an original RPQ into smaller subqueries, evaluating them in parallel and combining partial answers. In these works, how to choose split labels in an RPQ is one of key points of the performance of RPQ evaluation, and rare labels of a graph can be used as split labels. However there is still a room for improvement, because a rare label cannot guarantee the minimum evaluation cost all the time. In this paper, we propose a novel approach of selecting split labels by estimating evaluation cost of each split subquery with a unit-subquery cost matrix (USCM), which can be obtained from a graph in prior to evaluate an RPQ. USCM presents the evaluation cost of a unit-subquery which is the smallest possible subquery, and we can estimate the evaluation cost of an RPQ by decomposing into a set of unit-subqueries. Experimental results show that our proposed approach outperforms rare label based approaches.