Analysis and Synthesis of Emotional Voice Based on Time-Frequency Pitch Distributions

Mamoru KOBAYASHI  Shigeo WADA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E89-A   No.8   pp.2100-2106
Publication Date: 2006/08/01
Online ISSN: 1745-1337
DOI: 10.1093/ietfec/e89-a.8.2100
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Papers Selected from the 20th Symposium on Signal Processing)
emotional voice,  speech processing,  time-frequency analysis and synthesis,  

Full Text: PDF(548.4KB)>>
Buy this Article

In this paper, analysis and synthesis methods of emotional voice for man-machine natural interface is developed. First, the emotional voice (neutral, anger, sadness, joy, dislike) is analyzed using time-frequency representation of speech and similarity analysis. Then, based on the result of emotional analysis, a voice with neutral emotion is transformed to synthesize the particular emotional voice using time-frequency modifications. In the simulations, five types of emotion are analyzed using 50 samples of speech signals. The high average discrimination rate is achieved in the similarity analysis. Further, the synthesized emotional voice is subjectively evaluated. It is confirmed that the emotional voice is naturally generated by the proposed time-frequency based approach.