Supervised Denoising Pre-Training for Robust ASR with DNN-HMM

Shin Jae KANG  Kang Hyun LEE  Nam Soo KIM  

IEICE TRANSACTIONS on Information and Systems   Vol.E98-D   No.12   pp.2345-2348
Publication Date: 2015/12/01
Publicized: 2015/09/07
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2015EDL8118
Type of Manuscript: LETTER
Category: Speech and Hearing
deep neural networks (DNNs),  pre-training,  denoising,  back-propagation,  robust speech recognition,  

Full Text: PDF(79KB)>>
Buy this Article

In this letter, we propose a novel supervised pre-training technique for deep neural network (DNN)-hidden Markov model systems to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB, and better results were observed compared to a number of conventional pre-training methods.