Joint Optimization of Perceptual Gain Function and Deep Neural Networks for Single-Channel Speech Enhancement

Wei HAN  Xiongwei ZHANG  Gang MIN  Xingyu ZHOU  Meng SUN  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E100-A   No.2   pp.714-717
Publication Date: 2017/02/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E100.A.714
Type of Manuscript: LETTER
Category: Noise and Vibration
speech enhancement,  deep neural networks,  perceptual gain function,  joint optimization,  

Full Text: PDF(632.4KB)>>
Buy this Article

In this letter, we explore joint optimization of perceptual gain function and deep neural networks (DNNs) for a single-channel speech enhancement task. A DNN architecture is proposed which incorporates the masking properties of the human auditory system to make the residual noise inaudible. This new DNN architecture directly trains a perceptual gain function which is used to estimate the magnitude spectrum of clean speech from noisy speech features. Experimental results demonstrate that the proposed speech enhancement approach can achieve significant improvements over the baselines when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.