A Stochastic F0 Contour Model Based on Clustering and a Probabilistic Measure

Yoichi YAMASHITA  Tomoyoshi ISHIDA  Kazuki SHIMADERA  

IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.3   pp.543-549
Publication Date: 2003/03/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)
Category: Speech Synthesis and Prosody
speech synthesis,  F0 model,  clustering,  bunsetsu F0 shape,  stochastic method,  

Full Text: PDF(503.7KB)>>
Buy this Article

One of fundamental issues on the F0 contour is modeling relationship between F0 parameters and linguistic information of a sentence. This paper proposes a stochastic F0 model which probabilistically models the relationship between the F0 contour and the linguistic information. For the application of speech synthesis, an F0 generator selects the most probable F0 contour from candidates given by a probabilistic F0 model. An F0 contour of a Japanese sentence is represented by concatenation of F0 patterns of a Japanese syntactic unit, bunsetsu. A bunsetsu F0 pattern is composed of an F0 average and an F0 shape. The F0 average is independently predicted for each bunsetsu by a quantification theory from linguistic features of the bunsetsu. The most probable sequence of bunsetsu F0 shapes for a sentence is found in the F0 shape database based on a probabilistic measure. The probability that an F0 contour is observed for a sentence is defined by two kinds of probabilities, the F0 shape production and the F0 shape bigram. The latter is a probability of adjacent occurrence of two F0 shapes, which is similar to a word bigram in speech recognition. Several typical bunsetsu F0 shapes are extracted by clustering of training data and stored in the F0 shape database. The probability of the F0 shape production is computed for each bunsetsu based on distribution of values for the linguistic feature in a cluster. The RMS prediction errors of the F0 contour are 0.26[octave].