A Real-Time Speech Dialogue System Using Spontaneous Speech Understanding

Yoichi TAKEBAYASHI  Hiroyuki TSUBOI  Hiroshi KANAZAWA  Yoichi SADAMOTO  Hideki HASHIMOTO  Hideaki SHINCHI  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E76-D   No.1   pp.112-120
Publication Date: 1993/01/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech and Discourse Processing in Dialogue Systems)
Category: 
Keyword: 
dialogue,  speech understanding,  human-computer interaction,  human interface,  multimedia,  

Full Text: PDF>>
Buy this Article




Summary: 
This paper describes a task-oriented speech dialogue system based on spontaneous speech understanding and response generation (TOSBURG). The system has been developed for a fast food ordering task using speaker-independent keyword-based spontaneous speech understanding. Its purpose being to understand the user's intention from spontaneous speech, the system consists of a noise-robust keyword-spotter, a semantic keyword lattice parser, a user-initiated dialogue manager and a multimodal response generator. After noise immunity keyword-spotting is performed, the spotted keyword candidates are analyzed by a keyword lattice parser to extract the semantic content of the input speech. Then, referring to the dialogue history and context, the dialogue manager interprets the semantic content of the input speech. In cases where the interpretation is ambiguous or uncertain, the dialogue manager invites the user to confirm verbally the system's understanding of the speech input. The system's response to the user throughout the dialogue is multimodal; that is, several modes of communication (synthesized speech, text, animated facial expressions and ordered food items) are used to convey the system's state to the user. The object here is to emulate the multimodal interaction that occurs between humans, and so achieve more natural and efficient human-computer interaction. The real-time dialogue system has been constructed using two general purpose workstations and four DSP accelerators (520MFLOPS). Experimental results have shown the effectiveness of the newly developed speech dialogue system.