For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Ensembles of Dissimilar Acoustic Models Based on Big Data for Large Vocabulary Continuous Speech Recognition
Takashi FUKUDA Ryuki TACHIBANA Daniel WILLETT Puming ZHAN
D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)
Publication Date: 2015/08/01
Online ISSN: 1881-0225
Type of Manuscript: PAPER
LVCSR, acoustic model, big data, system combination, dissimilarity,
Full Text(in Japanese): PDF(823.2KB)
>>Buy this Article
One of the objectives in acoustic modeling is to realize robust statistical models against the wide variety of acoustic conditions that are present in real world environments. As large amounts of training data become available, modeling subsets of the data with similar acoustic qualities can be done accurately and multiple acoustic models are jointly used as a form of system combination or model selection. In this paper, we propose a method to partition the training data for constructing ensembles of acoustic models using metadata attributes such as signal to noise ratio (SNR), speaking rate, and duration via a binary tree. The metadata attribute used at each binary split in the decision tree is obtained using a metric proposed in this paper that is cosine-similarity based. The resulting multiple models are combined using voting techniques such as n-best ROVER (Recognizer Output Voting Error Reduction). The proposed method improved the recognition accuracy by up to 4% relative over the state-of-the-art system on a large vocabulary continuous speech recognition voice search task.