|
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
|
Utilizing the Web for Automatic Word Spacing
Gumwon HONG Jeong-Hoon LEE Young-In SONG Do-Gil LEE Hae-Chang RIM
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E92-D
No.12
pp.2553-2556 Publication Date: 2009/12/01 Online ISSN: 1745-1361
DOI: 10.1587/transinf.E92.D.2553 Print ISSN: 0916-8532 Type of Manuscript: LETTER Category: Natural Language Processing Keyword: word spacing, word segmentation,
Full Text: PDF>>
Summary:
This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
|
|