Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Takashi SAITO  

IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.3   pp.1100-1106
Publication Date: 2006/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.3.1100
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category: Speech Analysis
F0 contour generation,  F0 shape unit,  statistical modeling,  voice font,  corpus-based speech synthesis,  

Full Text: PDF(763.7KB)>>
Buy this Article

This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.