HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation


IEICE TRANSACTIONS on Information and Systems   Vol.E92-D   No.3   pp.489-497
Publication Date: 2009/03/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E92.D.489
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
expressive speech,  HMM-based speech synthesis,  style control,  multiple-regression HSMM (MRHSMM),  model adaptation,  average voice model,  

Full Text: PDF(727.7KB)>>
Buy this Article

This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.