For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Real-Time Speech Dialogue System Using Spontaneous Speech Understanding
Yoichi TAKEBAYASHI Hiroyuki TSUBOI Hiroshi KANAZAWA Yoichi SADAMOTO Hideki HASHIMOTO Hideaki SHINCHI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 1993/01/25
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Speech and Discourse Processing in Dialogue Systems)
dialogue, speech understanding, human-computer interaction, human interface, multimedia,
Full Text: PDF>>
This paper describes a task-oriented speech dialogue system based on spontaneous speech understanding and response generation (TOSBURG). The system has been developed for a fast food ordering task using speaker-independent keyword-based spontaneous speech understanding. Its purpose being to understand the user's intention from spontaneous speech, the system consists of a noise-robust keyword-spotter, a semantic keyword lattice parser, a user-initiated dialogue manager and a multimodal response generator. After noise immunity keyword-spotting is performed, the spotted keyword candidates are analyzed by a keyword lattice parser to extract the semantic content of the input speech. Then, referring to the dialogue history and context, the dialogue manager interprets the semantic content of the input speech. In cases where the interpretation is ambiguous or uncertain, the dialogue manager invites the user to confirm verbally the system's understanding of the speech input. The system's response to the user throughout the dialogue is multimodal; that is, several modes of communication (synthesized speech, text, animated facial expressions and ordered food items) are used to convey the system's state to the user. The object here is to emulate the multimodal interaction that occurs between humans, and so achieve more natural and efficient human-computer interaction. The real-time dialogue system has been constructed using two general purpose workstations and four DSP accelerators (520MFLOPS). Experimental results have shown the effectiveness of the newly developed speech dialogue system.