For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Noise Robust Speech Recognition Using F0 Contour Information
Koji IWANO Takahiro SEKI Sadaoki FURUI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2004/05/01
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)
noise robust speech recognition, prosody, fundamental frequency (F0) contour, multi-stream HMM, Hough transform,
Full Text: PDF(1.1MB)>>
This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F0 extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F0 contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.