Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus

Jinyoung KIM  Joohun LEE  Katsuhiko SHIRAI  

IEICE TRANSACTIONS on Information and Systems   Vol.E86-D    No.2    pp.334-339
Publication Date: 2003/02/01
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: LETTER
Category: Databases
lip sync,  AV corpus,  corpus-based synthesis,  

Full Text: PDF>>
Buy this Article

In this paper, we propose a corpus-based lip-sync algorithm for natural face animation. For this purpose, we constructed a Korean audio-visual (AV) corpus. Based on this AV corpus, we propose a concatenation method of AV units, which is similar to a corpus-based text-to-speech system. For our AV corpus, lip-related parameters were extracted from every video-recorded facial shot which of speaker reads the given texts selected from newspapers. The spoken utterances were labeled with HTK and such prosodic information as duration, pitch and intensity was extracted as lip-sync parameters. Based on the constructed AV corpus, basic synthetic units are set by CVC-syllable units. For the best concatenation performance, based on the phonetic environment distance and the prosodic distance, the best path is estimated by a general Viterbi search algorithm. From the computer simulation results, we found that the information concerned with not only duration but also pitch and intensity is useful to enhance the lip-sync performance. And the reconstructed lip parameters have almost equal values to those of the original parameters.