Effectiveness of Word String Language Models on Noisy Broadcast News Speech Recognition

Kazuyuki TAKAGI  Rei OGURO  Kazuhiko OZEKI  

IEICE TRANSACTIONS on Information and Systems   Vol.E85-D   No.7   pp.1130-1137
Publication Date: 2002/07/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
word string,  language model,  robustness,  broadcast news speech,  noisy speech recognition,  

Full Text: PDF>>
Buy this Article

Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%, substitution errors by 9.3%, and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.