Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts


IEICE TRANSACTIONS on Information and Systems   Vol.E95-D   No.7   pp.1932-1946
Publication Date: 2012/07/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E95.D.1932
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Artificial Intelligence, Data Mining
relation extraction,  named entity,  surface feature,  information extraction,  

Full Text: PDF(3.7MB)>>
Buy this Article

Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.