Processing Unknown Words in Continuous Speech Recognition

Kenji KITA  Terumasa EHARA  Tsuyoshi MORIMOTO  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E74-A   No.7   pp.1811-1816
Publication Date: 1991/07/25
Online ISSN: 
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Issue on Continuous Speech Recognition and Understanding)
Category: Continuous Speech Recognition

Full Text: PDF>>
Buy this Article

Current continuous speech recognition systems essentially ignore unknown words. Systems are designed to recognize words in the lexicon. However, for using speech recognition systems in a real application such as spoken-language processing, it is very important to process unknown words. This paper proposes a continuous speech recognition method which accepts any utterance that might include unknown words. In this method, words not in the lexicon are transcribed as phone sequences, while words in the lexicon are recognized correctly. The HMM-LR speech recognition system, which is an integration of Hidden Markov Models and generalized LR parsing, is used as the baseline system, and enhanced with the trigram model of syllables to take into account the stochastic characteristics of a language. In our approach, two kinds of grammars, a task grammar which describes the task and a phonetic grammar which describes constraints between phones, are merged and used in the HMM-LR system. The system can output a phonetic transcription for an unknown word by using the phonetic grammar. Experiment results indicate that our approach is very promising.