For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech
Jin-Song ZHANG Konstantin MARKOV Tomoko MATSUI Satoshi NAKAMURA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2003/03/01
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)
Category: Robust Speech Recognition and Enhancement
conversational speech recognition, inter-word pause, context dependent HMMs, prosodic phrase boundary,
Full Text: PDF(390.8KB)>>
This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.