A Context Clustering Technique for Average Voice Models

Junichi YAMAGISHI  Masatsune TAMURA  Takashi MASUKO  Keiichi TOKUDA  Takao KOBAYASHI  

IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.3   pp.534-542
Publication Date: 2003/03/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)
Category: Speech Synthesis and Prosody
decision tree,  context clustering,  average voice model,  HMM-based speech synthesis,  speaker independent model,  

Full Text: PDF(752.9KB)>>
Buy this Article

This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.