Statistical Model-Based VAD Algorithm with Wavelet Transform

Yoon-Chang LEE  Sang-Sik AHN  

Publication
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E89-A   No.6   pp.1594-1600
Publication Date: 2006/06/01
Online ISSN: 1745-1337
DOI: 10.1093/ietfec/e89-a.6.1594
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Papers Selected from 2005 International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2005))
Category: 
Keyword: 
voice activity detection,  wavelet packet decomposition,  multi-band spectral subtraction,  adaptive threshold,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper presents a new statistical model-based voice activity detection (VAD) algorithm in the wavelet domain to improve the performance in non-stationary environments. Due to the efficient time-frequency localization and the multi-resolution characteristics of the wavelet representations, the wavelet transforms are quite suitable for processing non-stationary signals such as speech. To utilize the fact that the wavelet packet is very efficient approximation of discrete Fourier transform and has built-in de-noising capability, we first apply wavelet packet decomposition to effectively localize the energy in frequency space, use spectral subtraction, and employ matched filtering to enhance the SNR. Since the conventional wavelet-based spectral subtraction eliminates the low-power speech signal in onset and offset regions and generates musical noise, we derive an improved multi-band spectral subtraction. On the other hand, noticing that fixed threshold cannot follow fluctuations of time varying noise power and the inability to adapt to a time-varying environment severely limits the VAD performance, we propose a statistical model-based VAD algorithm in wavelet domain with an adaptive threshold. We perform extensive computer simulations and compare with the conventional algorithms to demonstrate performance improvement of the proposed algorithm under various noise environments.