Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech

Nick CAMPBELL   

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E88-D   No.3   pp.376-383
Publication Date: 2005/03/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: INVITED PAPER (Special Section on Corpus-Based Speech Technologies)
Category: 
Keyword: 
speech synthesis ,  corpora ,  concatenation ,  paralinguistic information ,  communication ,  affect ,  

Full Text: PDF(136.1KB)
>>Buy this Article


Summary: 
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.