For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution
Chiyomi MIYAJIMA Yosuke HATTORI Keiichi TOKUDA Takashi MASUKO Takao KOBAYASHI Tadashi KITAMURA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2001/07/01
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Biometric Person Authentication)
speaker identification, pitch, multi-space probability distribution, Gaussian mixture model (GMM), minimum classification error,
Full Text: PDF(921.7KB)>>
This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.