On Automatic Speech Recognition at the Dawn of the 21st Century

Chin-Hui LEE  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E86-D   No.3   pp.377-396
Publication Date: 2003/03/01
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: INVITED SURVEY PAPER
Category: 
Keyword: 
automatic speech recognition,  pattern recognition,  hidden Markov model,  dynamic programming,  utterance verification,  acoustic modeling,  lexical modeling,  language modeling,  feature extraction and detection,  heuristic search,  string decoding,  maximum likelihood,  maximum a posterior,  hypothesis testing,  distinctive features,  acoustics,  phonetics,  computational linguistics,  knowledge sources,  

Full Text: PDF>>
Buy this Article




Summary: 
In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.