For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks
Wei HAN Xiongwei ZHANG Meng SUN Li LI Wenhua SHI
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Publication Date: 2017/02/01
Online ISSN: 1745-1337
Type of Manuscript: LETTER
Category: Speech and Hearing
monaural speech separation, deep recurrent neural networks, perceptual weighting matrix, supervised training,
Full Text: PDF(448KB)
>>Buy this Article
In this letter, we propose a novel speech separation method based on perceptual weighted deep recurrent neural network (DRNN) which incorporate the masking properties of the human auditory system. In supervised training stage, we firstly utilize the clean label speech of two different speakers to calculate two perceptual weighting matrices. Then, the obtained different perceptual weighting matrices are utilized to adjust the mean squared error between the network outputs and the reference features of both the two clean speech so that the two different speech can mask each other. Experimental results on TSP speech corpus demonstrate that the proposed speech separation approach can achieve significant improvements over the state-of-the-art methods when tested with different mixing cases.