For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Context Clustering Technique for Average Voice Models
Junichi YAMAGISHI Masatsune TAMURA Takashi MASUKO Keiichi TOKUDA Takao KOBAYASHI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2003/03/01
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)
Category: Speech Synthesis and Prosody
decision tree, context clustering, average voice model, HMM-based speech synthesis, speaker independent model,
Full Text: PDF(752.9KB)>>
This paper describes a new context clustering technique for average voice model, which is a set of speaker independent speech synthesis units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a decision tree common to these speaker dependent models for context clustering. When a node of the decision tree is split, only the context related questions which are applicable to all speaker dependent models are adopted. As a result, every node of the decision tree always has training data of all speakers. After construction of the decision tree, all speaker dependent models are clustered using the common decision tree and a speaker independent model, i.e., an average voice model is obtained by combining speaker dependent models. From the results of subjective tests, we show that the average voice models trained using the proposed technique can generate more natural sounding speech than the conventional average voice models.