An Approach Using Combination of Multiple Features through Sigmoid Function for Speech-Presence/Absence Discrimination

Kun-Ching WANG  Chiun-Li CHIN  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E94-A   No.8   pp.1630-1637
Publication Date: 2011/08/01
Online ISSN: 1745-1337
DOI: 10.1587/transfun.E94.A.1630
Print ISSN: 0916-8508
Type of Manuscript: PAPER
Category: Engineering Acoustics
speech detection,  combination of multiple features,  bark-scale wavelet decomposition,  adaptive frequency sub-band extraction,  sigmoid function,  

Full Text: PDF>>
Buy this Article

In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency sub-band to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency sub-band extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2.