A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

Jin-Song ZHANG  Konstantin MARKOV  Tomoko MATSUI  Satoshi NAKAMURA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.3   pp.489-496
Publication Date: 2003/03/01
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)
Category: Robust Speech Recognition and Enhancement
Keyword: 
conversational speech recognition,  inter-word pause,  context dependent HMMs,  prosodic phrase boundary,  

Full Text: PDF(390.8KB)>>
Buy this Article




Summary: 
This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.