Submodular Based Unsupervised Data Selection

Aiying ZHANG  Chongjia NI  

IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.6   pp.1591-1604
Publication Date: 2018/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017EDP7367
Type of Manuscript: PAPER
Category: Speech and Hearing
keyword spotting,  submodular,  multilingual data selection,  language identification,  recurrent neural network long short term memory,  

Full Text: PDF(1.7MB)
>>Buy this Article

Automatic speech recognition (ASR) and keyword search (KWS) have more and more found their way into our everyday lives, and their successes could boil down lots of factors. In these factors, large scale of speech data used for acoustic modeling is the key factor. However, it is difficult and time-consuming to acquire large scale of transcribed speech data for some languages, especially for low-resource languages. Thus, at low-resource condition, it becomes important with which transcribed data for acoustic modeling for improving the performance of ASR and KWS. In view of using acoustic data for acoustic modeling, there are two different ways. One is using the target language data, and another is using large scale of other source languages data for cross-lingual transfer. In this paper, we propose some approaches for efficient selecting acoustic data for acoustic modeling. For target language data, a submodular based unsupervised data selection approach is proposed. The submodular based unsupervised data selection could select more informative and representative utterances for manual transcription for acoustic modeling. For other source languages data, the high misclassified as target language based submodular multilingual data selection approach and knowledge based group multilingual data selection approach are proposed. When using selected multilingual data for multilingual deep neural network training for cross-lingual transfer, it could improve the performance of ASR and KWS of target language. When comparing our proposed multilingual data selection approach with language identification based multilingual data selection approach, our proposed approach also obtains better effect. In this paper, we also analyze and compare the language factor and the acoustic factor influence on the performance of ASR and KWS. The influence of different scale of target language data on the performance of ASR and KWS at mono-lingual condition and cross-lingual condition are also compared and analyzed, and some significant conclusions can be concluded.