An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns

Changliang LIU
Fuping PAN
Fengpei GE
Hongbin SUO
Yonghong YAN

IEICE TRANSACTIONS on Information and Systems   Vol.E92-D    No.9    pp.1716-1724
Publication Date: 2009/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E92.D.1716
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
CALL,  reading tutor,  reading miscues,  LVCSR,  multiple pronunciation,  

Full Text: PDF(867.4KB)>>
Buy this Article

This paper describes a reading miscue detection system based on the conventional Large Vocabulary Continuous Speech Recognition (LVCSR) framework [1]. In order to incorporate the knowledge of reference (what the reader ought to read) and some error patterns into the decoding process, two methods are proposed: Dynamic Multiple Pronunciation Incorporation (DMPI) and Dynamic Interpolation of Language Model (DILM). DMPI dynamically adds some pronunciation variations into the search space to predict reading substitutions and insertions. To resolve the conflict between the coverage of error predications and the perplexity of the search space, only the pronunciation variants related to the reference are added. DILM dynamically interpolates the general language model based on the analysis of the reference and so keeps the active paths of decoding relatively near the reference. It makes the recognition more accurate, which further improves the detection performance. At the final stage of detection, an improved dynamic program (DP) is used to align the confusion network (CN) from speech recognition and the reference to generate the detecting result. The experimental results show that the proposed two methods can decrease the Equal Error Rate (EER) by 14% relatively, from 46.4% to 39.8%.

open access publishing via