Distant Speech Recognition Using a Microphone Array Network

Alberto Yoshihiro NAKANO  Seiichi NAKAGAWA  Kazumasa YAMAMOTO  

IEICE TRANSACTIONS on Information and Systems   Vol.E93-D   No.9   pp.2451-2462
Publication Date: 2010/09/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E93.D.2451
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction)
Category: Microphone Array
distant speech recognition,  microphone array network,  GMM-based CMN,  speaker's position and orientation estimation,  

Full Text: PDF>>
Buy this Article

In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.