Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

Tsang-Long PAO  Yu-Te CHEN  Jun-Heng YEH  

IEICE TRANSACTIONS on Information and Systems   Vol.E91-D   No.4   pp.1074-1081
Publication Date: 2008/04/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e91-d.4.1074
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Human-computer Interaction
emotion detection,  Mandarin speech,  performance comparison,  

Full Text: PDF(385.6KB)>>
Buy this Article

It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.