Distant Speech Recognition Using a Microphone Array Network

Alberto Yoshihiro NAKANO  Seiichi NAKAGAWA  Kazumasa YAMAMOTO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.9   pp.2451-2462
Publication Date: 2010/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.2451
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction)
Category: Microphone Array
Keyword: 
distant speech recognition,  microphone array network,  GMM-based CMN,  speaker's position and orientation estimation,  

Full Text: PDF>>
Buy this Article




Summary: 
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.