|
|
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
|
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
Heiga ZEN Tomoki TODA Masaru NAKAMURA Keiichi TOKUDA
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E90-D
No.1
pp.325-333 Publication Date: 2007/01/01
Online ISSN: 1745-1361 Print ISSN: 0916-8532 Type of Manuscript: PAPER Category: Speech and Hearing Keyword: HMM-based speech synthesis, Blizzard Challenge 2005, STRAIGHT, HSMM, GV,
Full Text: PDF(701.6KB) >>Buy this Article
Summary:
In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.
|
|