Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution

Chiyomi MIYAJIMA  Yosuke HATTORI  Keiichi TOKUDA  Takashi MASUKO  Takao KOBAYASHI  Tadashi KITAMURA  

IEICE TRANSACTIONS on Information and Systems   Vol.E84-D   No.7   pp.847-855
Publication Date: 2001/07/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Biometric Person Authentication)
speaker identification,  pitch,  multi-space probability distribution,  Gaussian mixture model (GMM),  minimum classification error,  

Full Text: PDF(921.7KB)>>
Buy this Article

This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.