Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Shinsuke SAKAI  Tatsuya KAWAHARA  Hisashi KAWAI  

IEICE TRANSACTIONS on Information and Systems   Vol.E94-D   No.10   pp.2006-2014
Publication Date: 2011/10/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E94.D.2006
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
speech synthesis,  unit selection,  concatenation cost,  join cost,  

Full Text: PDF>>
Buy this Article

The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.