For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Dynamic Bayesian Network-Based Acoustic Models Incorporating Speaking Rate Effects
Takahiro SHINOZAKI Sadaoki FURUI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2004/10/01
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
spontaneous speech recognition, speaking rate, dynamic Bayesian network, acoustic modeling,
Full Text: PDF(512.4KB)>>
One of the most important issues in spontaneous speech recognition is how to cope with the degradation of recognition accuracy due to speaking rate fluctuation within an utterance. This paper proposes an acoustic model for adjusting mixture weights and transition probabilities of the HMM for each frame according to the local speaking rate. The proposed model is implemented along with variants and conventional models using the Bayesian network framework. The proposed model has a hidden variable representing variation of the "mode" of the speaking rate, and its value controls the parameters of the underlying HMM. Model training and maximum probability assignment of the variables are conducted using the EM/GEM and inference algorithms for the Bayesian networks. Utterances from meetings and lectures are used for evaluation where the Bayesian network-based acoustic models are used to rescore the likelihood of the N-best lists. In the experiments, the proposed model indicated consistently higher performance than conventional HMMs and regression HMMs using the same speaking rate information.