An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text

Peerasak INTARAPAIBOON  Thanaruk THEERAMUNKONG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.9   pp.2334-2345
Publication Date: 2018/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDP7423
Type of Manuscript: PAPER
Category: Artificial Intelligence, Data Mining
Keyword: 
intuitionistic fuzzy set,  similarity measure,  multi-slot information extraction,  

Full Text: PDF(1017.3KB)
>>Buy this Article


Summary: 
Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.