For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Speaker Diarization and Source Number Estimation Based on Audio-Visual Integration
Yukoh WAKABAYASHI Koji INOUE Masato NAKAYAMA Takanobu NISHIURA Yoichi YAMASHITA Hiromasa YOSHIMOTO Tatsuya KAWAHARA
D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)
Publication Date: 2016/03/01
Online ISSN: 1881-0225
Type of Manuscript: Special Section PAPER (Special Section on Student Research)
speaker diarization, sound source localization, multi-modal, source number estimation, multi-party conversation,
Full Text(in Japanese): PDF(1.4MB)
>>Buy this Article
We present speaker diarization and source number estimation method based on audio-visual integration in multi-party conversation. Speaker diarization represents the estimation “who speaks when." This plays an important role for understanding utterance contents and analyzing human-human interaction such as turn-talking and timing of back-channel. We integrate sound source localization and participants head location from audio and visual information, respectively. Moreover, we conduct source number estimation, which is essential to the improvement of sound source localization, by using audio-visual integration. In the past, the number has been assumed to be known. However, it is difficult to know it in advance in natural conversations. Experimental results show the proposed method improves diarization and source number accuracy compared with the conventional methods.