Missing Feature Theory Applied to Robust Speech Recognition over IP Network

Toshiki ENDO  Shingo KUROIWA  Satoshi NAKAMURA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E87-D   No.5   pp.1119-1126
Publication Date: 2004/05/01
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)
Category: 
Keyword: 
DSR,  data loss,  data imputation,  marginalization,  

Full Text: PDF(1.3MB)>>
Buy this Article




Summary: 
This paper addresses problems involved in performing speech recognition over mobile and IP networks. The main problem is speech data loss caused by packet loss in the network. We present two missing-feature-based approaches that recover lost regions of speech data. These approaches are based on the reconstruction of missing frames or on marginal distributions. For comparison, we also use a packing method, which skips lost data. We evaluate these approaches with packet loss models, i.e., random loss and Gilbert loss models. The results show that the marginal-distributed-based technique is most effective for a packet loss environment; the degradation of word accuracy is only 5% when the packet loss rate is 30% and only 3% when mean burst loss length is 24 frames in the case of DSR front-end. The simple data imputation method is also effective in the case of clean speech.