For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus
Jinyoung KIM Joohun LEE Katsuhiko SHIRAI
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2003/02/01
Print ISSN: 0916-8532
Type of Manuscript: LETTER
lip sync, AV corpus, corpus-based synthesis,
Full Text: PDF(432.7KB)>>
In this paper, we propose a corpus-based lip-sync algorithm for natural face animation. For this purpose, we constructed a Korean audio-visual (AV) corpus. Based on this AV corpus, we propose a concatenation method of AV units, which is similar to a corpus-based text-to-speech system. For our AV corpus, lip-related parameters were extracted from every video-recorded facial shot which of speaker reads the given texts selected from newspapers. The spoken utterances were labeled with HTK and such prosodic information as duration, pitch and intensity was extracted as lip-sync parameters. Based on the constructed AV corpus, basic synthetic units are set by CVC-syllable units. For the best concatenation performance, based on the phonetic environment distance and the prosodic distance, the best path is estimated by a general Viterbi search algorithm. From the computer simulation results, we found that the information concerned with not only duration but also pitch and intensity is useful to enhance the lip-sync performance. And the reconstructed lip parameters have almost equal values to those of the original parameters.