For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2005/03/01
Print ISSN: 0916-8532
Type of Manuscript: INVITED PAPER (Special Section on Corpus-Based Speech Technologies)
speech synthesis, corpora, concatenation, paralinguistic information, communication, affect,
Full Text: PDF(136.2KB)
>>Buy this Article
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.