Fast Speaker Normalization and Adaptation Based on BIC for Meeting Speech Recognition

Masato MIMURA  Tatsuya KAWAHARA 

Publication
D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)  Vol.J95-D  No.7  pp.1467-1475
Publication Date: 2012/07/01
Online ISSN: 1881-0225
Print ISSN: 1880-4535
Type of Manuscript: PAPER
Category: 
Keyword: 
meeting speech recognitionspeaker normalizationspeaker adaptationBICVTLNMLLR

Full Text(in Japanese): PDF(403.8KB)


Summary: 
This paper presents a unified method for speech segmentation, speaker normalization of spectral features, and speaker adaptation of acoustic model for efficient meeting speech recognition. In the proposed method, input speech is segmented based on BIC (Bayesian Information Criterion), and compared against each speaker's statistic in the training corpus of the acoustic model based on the BIC. Fast VTLN (Vocal Tract Length Normalization) and MLLR (Maximum Likelihood Linear Regression) adaptation are realized using a pre-estimated warping factor and MLLR transformation matrices of the best-matched speakers, respectively. Experimental evaluations in Parliamentary speech transcription demonstrated that the proposed method achieved comparable ASR accuracy to the standard ML estimation for both VTLN and MLLR adaptation, with significant reduction of processing time.