An Automatic Knowledge Graph Creation Framework from Natural Language Text


IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.1   pp.90-98
Publication Date: 2018/01/01
Publicized: 2017/09/15
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017SWP0006
Type of Manuscript: Special Section PAPER (Special Section on Semantic Web and Linked Data)
knowledge graph,  knowledge discovery,  knowledge extraction,  linked data,  

Full Text: PDF(782.6KB)>>
Buy this Article

Knowledge graphs (KG) play a crucial role in many modern applications. However, constructing a KG from natural language text is challenging due to the complex structure of the text. Recently, many approaches have been proposed to transform natural language text to triples to obtain KGs. Such approaches have not yet provided efficient results for mapping extracted elements of triples, especially the predicate, to their equivalent elements in a KG. Predicate mapping is essential because it can reduce the heterogeneity of the data and increase the searchability over a KG. In this article, we propose T2KG, an automatic KG creation framework for natural language text, to more effectively map natural language text to predicates. In our framework, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its corresponding predicate in a KG. Based on experimental results, the hybrid approach can identify more similar predicate pairs than a baseline method in the predicate mapping task. An experiment on KG creation is also conducted to investigate the performance of the T2KG. The experimental results show that the T2KG also outperforms the baseline in KG creation. Although KG creation is conducted in open domains, in which prior knowledge is not provided, the T2KG still achieves an F1 score of approximately 50% when generating triples in the KG creation task. In addition, an empirical study on knowledge population using various text sources is conducted, and the results indicate the T2KG could be used to obtain knowledge that is not currently available from DBpedia.