For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
Tri-Thanh NGUYEN Akira SHIMAZU
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2007/10/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Knowledge, Information and Creativity Support System)
fine person categories extraction, named entities, pattern extraction, algorithm,
Full Text: PDF(430.1KB)>>
Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.