Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

Tri-Thanh NGUYEN  Akira SHIMAZU  

IEICE TRANSACTIONS on Information and Systems   Vol.E90-D   No.10   pp.1542-1549
Publication Date: 2007/10/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.10.1542
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Knowledge, Information and Creativity Support System)
fine person categories extraction,  named entities,  pattern extraction,  algorithm,  

Full Text: PDF(430.1KB)>>
Buy this Article

Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.