
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Fundamental Frequency Estimation for Noisy Speech Using EntropyWeighted Periodic and Harmonic Features
Yuichi ISHIMOTO Kentaro ISHIZUKA Kiyoaki AIKAWA Masato AKAGI
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E87D
No.1
pp.205214 Publication Date: 2004/01/01 Online ISSN:
DOI: Print ISSN: 09168532 Type of Manuscript: PAPER Category: Speech and Hearing Keyword: fundamental frequency estimation, entropy, instantaneous amplitude, periodic feature, harmonic feature,
Full Text: PDF>>
Summary:
This paper proposes a robust method for estimating the fundamental frequency (F0) in real environments. It is assumed that the spectral structure of real environmental noise varies momentarily and its energy does not distribute evenly in the timefrequency domain. Therefore, segmenting a spectrogram of speech mixed with environmental noise into narrow timefrequency regions will produce lownoise regions in which the signaltonoise ratio is high. The proposed method estimates F0 from the periodic and harmonic features that are clearly observed in the lownoise regions. It first uses two kinds of spectrogram, one with high frequency resolution and another with high temporal resolution, to represent the periodic and harmonic features corresponding to F0. Next, the method segments these two kinds of feature plane into narrow timefrequency regions, and calculates the probability function of F0 for each region. It then utilizes the entropy of the probability function as weight to emphasize the probability function in the lownoise region and to enhance noise robustness. Finally, the probability functions are grouped in each time, and F0 is obtained as the frequency with the highest probability of the function. The experimental results showed that, in comparison with other approaches such as the cepstrum method and the autocorrelation method, the developed method can more robustly estimate F0s from speech in the presence of bandlimited noise and car noise.


