Homomorphic Filtered Spectral Peaks Energy for Automatic Detection of Vowel Onset Point in Continuous Speech

Xian ZANG  Kil To CHONG  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E96-D   No.4   pp.949-956
Publication Date: 2013/04/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E96.D.949
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
Keyword: 
vowel onset point,  homomorphic filtering,  peaks energy,  vocal tract spectrum,  noise robustness,  

Full Text: PDF(1.1MB)>>
Buy this Article




Summary: 
During the production of speech signals, the vowel onset point is an important event containing important information for many speech processing tasks, such as consonant-vowel unit recognition and speech end-points detection. In order to realize accurate automatic detection of vowel onset points, this paper proposes a reliable method using the energy characteristics of homomorphic filtered spectral peaks. The homomorphic filtering helps to separate the slowly varying vocal tract system characteristics from the rapidly fluctuating excitation characteristics in the cepstral domain. The distinct vocal tract shape related to vowels is obtained and the peaks in the estimated vocal tract spectrum provide accurate and stable information for VOP detection. Performance of the proposed method is compared with the existing method which uses the combination of evidence from the excitation source, spectral peaks, and modulation spectrum energies. The detection rate with different time resolutions, together with the missing rate and spurious rate, are used for comprehensive evaluation of the performance on continuous speech taken from the TIMIT database. The detection accuracy of the proposed method is 74.14% for ±10 ms resolution and it increases to 96.33% for ±40 ms resolution with 3.67% missing error and 4.14% spurious error, much better than the results obtained by the combined approach at each specified time resolution, especially the higher resolutions of ±10±30 ms. In the cases of speech corrupted by white noise, pink noise and f-16 noise, the proposed method also shows significant improvement in the performance compared with the existing method.