Online Convolutive Non-Negative Bases Learning for Speech Enhancement

Yinan LI  Xiongwei ZHANG  Meng SUN  Yonggang HU  Li LI  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E99-A   No.8   pp.1609-1613
Publication Date: 2016/08/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E99.A.1609
Type of Manuscript: LETTER
Category: Speech and Hearing
convolutive non-negative sparse coding,  online learning,  speech enhancement,  

Full Text: PDF(165.4KB)>>
Buy this Article

An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.