VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

Hiroki NOGUCHI  Kazuo MIURA  Tsuyoshi FUJINAGA  Takanobu SUGAHARA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

IEICE TRANSACTIONS on Electronics   Vol.E94-C   No.4   pp.458-467
Publication Date: 2011/04/01
Online ISSN: 1745-1353
DOI: 10.1587/transele.E94.C.458
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on Circuits and Design Techniques for Advanced Large Scale Integration)
speech recognition,  hidden Markov model (HMM),  VLSI architecture,  

Full Text: PDF(1.8MB)>>
Buy this Article

We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.