Currency Preserving Query: Selecting the Newest Values from Multiple Tables

Mohan LI  Yanbin SUN  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.12   pp.3059-3072
Publication Date: 2018/12/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2018EDP7058
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
Keyword: 
data quality,  data currency,  referential integrity constraints,  

Full Text: PDF(1.1MB)
>>Buy this Article


Summary: 
In many applications, tables are distributively stored in different data sources, but the frequency of updates on each data source is different. Some techniques have been proposed to effectively express the temporal orders between different values, and the most current, i.e. up-to-date, value of a given data item can be easily picked up according to the temporal orders. However, the currency of the data items in the same table may be different. That is, when a user asks for a table D, it cannot be ensured that all the most current values of the data items in D are stored in a single table. Since different data sources may have overlaps, we can construct a conjunctive query on multiple tables to get all the required current values. In this paper, we formalize the conjunctive query as currency preserving query, and study how to generate the minimized currency preserving query to reduce the cost of visiting different data sources. First, a graph model is proposed to represent the distributed tables and their relationships. Based on the model, we prove that a currency preserving query is equivalent to a terminal tree in the graph, and give an algorithm to generate a query from a terminal tree. After that, we study the problem of finding minimized currency preserving query. The problem is proved to be NP-hard, and some heuristics strategies are provided to solve the problem. Finally, we conduct experiments on both synthetic and real data sets to verify the effectiveness and efficiency of the proposed techniques.