The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006

Heiga ZEN  Tomoki TODA  Keiichi TOKUDA  

IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.6   pp.1764-1773
Publication Date: 2008/06/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.6.1764
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
statistical parametric speech synthesis,  Blizzard Challenge 2006,  MGC-LSP,  MLLT,  full covariance GV pdf,  

Full Text: PDF(1.4MB)>>
Buy this Article

We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.