Linked Data Entity Resolution System Enhanced by Configuration Learning Algorithm

Khai NGUYEN  Ryutaro ICHISE  

IEICE TRANSACTIONS on Information and Systems   Vol.E99-D   No.6   pp.1521-1530
Publication Date: 2016/06/01
Publicized: 2016/02/29
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDP7392
Type of Manuscript: PAPER
Category: Data Engineering, Web Information Systems
linked data,  entity resolution,  schema-independent,  supervised,  heuristic,  

Full Text: PDF(425.4KB)>>
Buy this Article

Linked data entity resolution is the detection of instances that reside in different repositories but co-describe the same topic. The quality of the resolution result depends on the appropriateness of the configuration, including the selected matching properties and the similarity measures. Because such configuration details are currently set differently across domains and repositories, a general resolution approach for every repository is necessary. In this paper, we present cLink, a system that can perform entity resolution on any input effectively by using a learning algorithm to find the optimal configuration. Experiments show that cLink achieves high performance even when being given only a small amount of training data. cLink also outperforms recent systems, including the ones that use the supervised learning approach.