For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Improve Multichannel Speech Recognition with Temporal and Spatial Information
Yu ZHANG Pengyuan ZHANG Qingwei ZHAO
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2018/07/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Speech and Hearing
multichannel speech recognition, long short-term memory, attention mechanism, generalized cross correlation,
Full Text: PDF(692.6KB)
>>Buy this Article
In this letter, we explored the usage of spatio-temporal information in one unified framework to improve the performance of multichannel speech recognition. Generalized cross correlation (GCC) is served as spatial feature compensation, and an attention mechanism across time is embedded within long short-term memory (LSTM) neural networks. Experiments on the AMI meeting corpus show that the proposed method provides a 8.2% relative improvement in word error rate (WER) over the model trained directly on the concatenation of multiple microphone outputs.