For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Improvements of the One-to-Many Eigenvoice Conversion System
Yamato OHTANI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2010/09/01
Online ISSN: 1745-1361
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction)
Category: Voice Conversion
speech synthesis, eigenvoice conversion, STRAIGHT mixed excitation, global variance, adaptive training,
Full Text: PDF>>
We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.