Phonology and Morphology Modeling in a Very Large Vocabulary Hungarian Dictation System

Mate SZARVAS  Sadaoki FURUI  

IEICE TRANSACTIONS on Information and Systems   Vol.E87-D   No.12   pp.2791-2801
Publication Date: 2004/12/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
phonology modeling,  language modeling,  morphology modeling,  finite-state transducer,  Hungarian speech recognition,  

Full Text: PDF>>
Buy this Article

This article introduces a novel approach to model phonology and morphosyntax in morpheme unit-based speech recognizers. The proposed methods are evaluated on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. The vocabulary units used in the system are morpheme-based in order to provide sufficient coverage of the large number of word-forms resulting from affixation and compounding. Besides the basic pronunciation model and the morpheme N-gram language model we evaluate a novel phonology model and the novel stochastic morphosyntactic language model (SMLM). Thanks to the flexible transducer-based architecture of the system, these new components are integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the N-gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The proposed phonology model reduced the error rate by 8.32%. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.