|
For Full-Text PDF, please login, if you are a member of IEICE,
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
|
Interactive Learning of Spoken Words and Their Meanings Through an Audio-Visual Interface
Naoto IWAHASHI
Publication
IEICE TRANSACTIONS on Information and Systems
Vol.E91-D
No.2
pp.312-321 Publication Date: 2008/02/01 Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.2.312 Print ISSN: 0916-8532 Type of Manuscript: PAPER Category: Speech and Hearing Keyword: mutli-modal interface, word acquisition, multi-media processing, interactive learning, active learning, word meaning,
Full Text: PDF>>
Summary:
This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient.
|
|
|