For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text
Peerasak INTARAPAIBOON Thanaruk THEERAMUNKONG
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2018/09/01
Online ISSN: 1745-1361
Type of Manuscript: PAPER
Category: Artificial Intelligence, Data Mining
intuitionistic fuzzy set, similarity measure, multi-slot information extraction,
Full Text: PDF(1017.3KB)
>>Buy this Article
Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.