Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM


IEICE TRANSACTIONS on Information and Systems   Vol.E89-D   No.3   pp.1058-1065
Publication Date: 2006/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.3.1058
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category: Speaker Recognition
speaker identification,  GMM,  HMM,  text verification,  

Full Text: PDF>>
Buy this Article

We presented a new text-independent/text-prompted speaker recognition method by combining speaker-specific Gaussian Mixture Model (GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style's change was evaluated in this paper. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers (22 males and 13 females) on five sessions over ten months was conducted. Each speaker uttered only 5 training utterances (about 20 seconds in total). A combination method reduced the identification error rate by about 50%. We obtained the accuracy of 98.8% for text-independent speaker identification for three speaking style modes (normal, fast, slow) by using a short test utterance (about 4 seconds). Especially, we obtained the accuracy of 99.4% for normal speaking mode. This result was superior to conventional methods for the same database. We show that the attractive result was brought from the compensational effect between speaker specific GMM and speaker adapted syllable based HMM.