Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition

Tetsuo KOSAKA  Yuui TAKEDA  Takashi ITO  Masaharu KATO  Masaki KOHDA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.9   pp.2363-2369
Publication Date: 2010/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.2363
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction)
Category: Adaptation
Keyword: 
speech recognition,  speaker adaptation,  speaker-class model,  LVCSR,  corpus of spontaneous Japanese,  

Full Text: PDF(393.6KB)
>>Buy this Article


Summary: 
In this paper, we propose a new speaker-class modeling and its adaptation method for the LVCSR system and evaluate the method on the Corpus of Spontaneous Japanese (CSJ). In this method, closer speakers are selected from training speakers and the acoustic models are trained by using their utterances for each evaluation speaker. One of the major issues of the speaker-class model is determining the selection range of speakers. In order to solve the problem, several models which have a variety of speaker range are prepared for each evaluation speaker in advance, and the most proper model is selected on a likelihood basis in the recognition step. In addition, we improved the recognition performance using unsupervised speaker adaptation with the speaker-class models. In the recognition experiments, a significant improvement could be obtained by using the proposed speaker adaptation based on speaker-class models compared with the conventional adaptation method.