HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

Takashi NOSE  Makoto TACHIBANA  Takao KOBAYASHI 

Publication
IEICE TRANSACTIONS on Information and Systems  Vol.E92-D  No.3  pp.489-497
Publication Date: 2009/03/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
Keyword: 
expressive speechHMM-based speech synthesisstyle controlmultiple-regression HSMM (MRHSMM)model adaptationaverage voice model

Full Text: PDF(725.3KB)


Summary: 
This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.