Variable Selection Linear Regression for Robust Speech Recognition

Ting-Yao HU
Sakriani SAKTI
Lin-shan LEE

IEICE TRANSACTIONS on Information and Systems   Vol.E97-D    No.6    pp.1477-1487
Publication Date: 2014/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.1477
Type of Manuscript: Special Section PAPER (Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application)
Category: Speech Recognition
variable selection,  linear regression,  MLLR,  fMLLR,  model space adaptation,  feature space adaptation,  

Full Text: PDF>>
Buy this Article

This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.