Robust Model for Speaker Verification against Session-Dependent Utterance Variation

Tomoko MATSUI  Kiyoaki AIKAWA  

IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.4   pp.712-718
Publication Date: 2003/04/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
speaker verification,  speaker model,  session dependent utterance variation,  handset dependent distortion,  

Full Text: PDF(221.7KB)>>
Buy this Article

This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.