Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Fengpei GE  Changliang LIU  Jian SHAO  Fuping PAN  Bin DONG  Yonghong YAN  

IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.10   pp.2485-2492
Publication Date: 2008/10/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.10.2485
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
CALL,  speech recognition,  HLDA,  speaker-dependent CMN,  e-learning,  

Full Text: PDF>>
Buy this Article

In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.