Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing


IEICE TRANSACTIONS on Information and Systems   Vol.E88-D   No.11   pp.2484-2491
Publication Date: 2005/11/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.11.2484
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Life-like Agent and its Communication)
HMM-based speech synthesis,  speaking style,  emotional expression,  style interpolation,  style morphing,  hidden semi-Markov model (HSMM),  

Full Text: PDF(964.3KB)>>
Buy this Article

This paper describes an approach to generating speech with emotional expressivity and speaking style variability. The approach is based on a speaking style and emotional expression modeling technique for HMM-based speech synthesis. We first model several representative styles, each of which is a speaking style and/or an emotional expression, in an HMM-based speech synthesis framework. Then, to generate synthetic speech with an intermediate style from representative ones, we synthesize speech from a model obtained by interpolating representative style models using a model interpolation technique. We assess the style interpolation technique with subjective evaluation tests using four representative styles, i.e., neutral, joyful, sad, and rough in read speech and synthesized speech from models obtained by interpolating models for all combinations of two styles. The results show that speech synthesized from the interpolated model has a style in between the two representative ones. Moreover, we can control the degree of expressivity for speaking styles or emotions in synthesized speech by changing the interpolation ratio in interpolation between neutral and other representative styles. We also show that we can achieve style morphing in speech synthesis, namely, changing style smoothly from one representative style to another by gradually changing the interpolation ratio.