For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis
Daiki SEKIZAWA Shinnosuke TAKAMICHI Hiroshi SARUWATARI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2019/06/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Speech and Hearing
HMM-based text-to-speech synthesis, non-native speech, Chinese-accented Japanese, prosody,
Full Text: FreePDF(507.5KB)
This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F0 correction are significantly effective for improving naturalness.