Bayesian Context Clustering Using Cross Validation for Speech Recognition

Kei HASHIMOTO  Heiga ZEN  Yoshihiko NANKAKU  Akinobu LEE  Keiichi TOKUDA  

IEICE TRANSACTIONS on Information and Systems   Vol.E94-D   No.3   pp.668-678
Publication Date: 2011/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E94.D.668
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
Bayesian approach,  speech recognition,  HMM,  context clustering,  cross validation,  

Full Text: PDF(453.4KB)>>
Buy this Article

This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.