Estimating Users' Subjective Impressions of Difficulty and Interest Levelfor a Speech Dialogue with an Embodied Agent from Their Nonverbal Behaviors

Kazuaki NAKAMURA  Koh KAKUSHO  Tetsuo SHOJI  Michihiko MINOH  Minako SAWAKI  Yasuhiro MINAMI  Eisaku MAEDA  

A - Abstracts of IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (Japanese Edition)   Vol.J95-A   No.1   pp.85-96
Publication Date: 2012/01/01
Online ISSN: 1881-0195
Print ISSN: 0913-5707
Type of Manuscript: Special Section PAPER (Special Issue on Human-Agent Interaction)
embodied agents,  speech dialogues,  difficulty level,  interest level,  nonverbal behaviors,  frequencies of behaviors,  

Full Text(in Japanese): PDF(694.1KB)
>>Buy this Article

Embodied agents, who can have a dialogue with human users, have become widely used as interfaces of information systems. In this paper, we aim to estimate the users' subjective impressions of difficulty level (SID) and those of interest level (SII) for the content of this kind of human-agent dialogue based on their nonverbal behaviors such as facial expressions, eye gaze, gestures, or postures. In general, it takes a certain length of time to decide and convey a content of a dialogue, so that the user's SID/SII for the dialogue content cannot be defined on every moment but be defined on every time-span of a certain length. However, during such time-spans, the users' state of mind corresponding to each nonverbal behavior performed on each moment will vary depending on the dialogue conditions such as the users' role (speaker/listener) or the dialogue context. Therefore the users' SID/SII can hardly be estimated correctly from the nonverbal behaviors performed on each moment. In this paper, we propose to use not moment-level features (such as the nonverbal behaviors themselves) but time-span-level features (frequency of each nonverbal behavior during the time-span) for estimating SID/SII. In the result of our experiment, the users' SID/SII were estimated with accuracy of 72% on an average by our approach under the condition of 40 seconds as the length of above time-spans.