Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition

Seonho KIM  Juntae YOON  

IEICE TRANSACTIONS on Information and Systems   Vol.E90-D   No.7   pp.1103-1110
Publication Date: 2007/07/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e90-d.7.1103
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Natural Language Processing
information extraction,  named entity recognition,  two-phase model,  ME,  CRF,  SVM,  FST,  

Full Text: PDF>>
Buy this Article

In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given sentence, and biomedical category labeling can be viewed as a local classification problem which does not need knowledge of the labels of other named entities in a sentence. The advantage of dividing the recognition process into two phases is that we can measure the effectiveness of models at each phase and select separately the appropriate model for each subtask. In order to obtain a better performance in biomedical named entity recognition, we conducted comparative experiments using several learning methods at each phase. Moreover, results by these machine learning based models are refined by rule-based postprocessing. We tested our methods on the JNLPBA 2004 shared task and the GENIA corpus.