Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

Sakriani SAKTI  Satoshi NAKAMURA  Konstantin MARKOV  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.3   pp.946-953
Publication Date: 2006/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.3.946
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category: Speech Recognition
Keyword: 
Bayesian framework,  wide phonetic context model,  acoustic rescoring,  

Full Text: PDF(355.8KB)>>
Buy this Article




Summary: 
Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined based on the Bayesian framework to achieve better inference and to better account for modeling uncertainty. The approach we adopted here is to utilize the benefits of the Bayesian framework to improve acoustic model precision in speech recognition systems, which modeling a wider-than-triphone context by approximating it using several less context-dependent models. Such a composition was developed in order to avoid the crucial problem of limited training data and to reduce the model complexity. To enhance the model reliability due to unseen contexts and limited training data, flooring and smoothing techniques are applied. Experimental results show that the proposed Bayesian pentaphone model improves word accuracy in comparison with the standard triphone model.