
For FullText PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.

Tensor Factor Analysis for Arbitrary Speaker Conversion
Daisuke SAITO Nobuaki MINEMATSU Keikichi HIROSE
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E103D
No.6
pp.13951405 Publication Date: 2020/06/01 Publicized: 2020/03/13 Online ISSN: 17451361
DOI: 10.1587/transinf.2019EDP7166 Type of Manuscript: PAPER Category: Speech and Hearing Keyword: voice conversion, Gaussian mixture models, eigenvoice, tensor factor analysis, Tucker decomposition,
Full Text: PDF>>
Summary:
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EVGMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are highdimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigensupervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of onetomany voice conversion demonstrate the effectiveness of the proposed approach.


