Multi-Level Speech Emotion Recognition Based on Fisher Criterion and SVM
CHEN Li-Jiang1, MAO Xia1, Mitsuru ISHIZUKA2
1.School of Electronic and Information Engineering,Beihang University,Beijing 100191 2.Department of Information and Communication Engineering,University of Tokyo,Japan
Abstract:To solve the speaker independent emotion recognition problem, a multi-level speech emotion recognition system is proposed to classify 6 speech emotions, including sadness, anger, surprise, fear, happiness and disgust from coarse to fine. The key is that the emotions divided by each layer are closely related to the emotional features of speech. For each level, appropriate features are selected from 288 candidate features by Fisher ratio which is also regarded as input parameter for the training of support vector machine (SVM). Based on Beihang emotional speech database and Berlin emotional speech database, principal component analysis (PCA) for dimension reduction and Artificial Neural Network (ANN) for classification are adopted to design 4 comparative experiments, including Fisher+SVM, PCA+SVM, Fisher+ANN, PCA+ANN. The experimental results prove that Fisher rule is better than PCA for dimension reduction, and SVM is more expansible than ANN for speaker independent speech emotion recognition. Good cross-cultural adaptation can be inferred from the similar results of experiments based on two different databases.
[1] Johnson W F,Emde R N,Scherer K R,et al.Recognition of Emotion from Vocal Cues.Arch Gen Psychiatry,1986,43(3): 280-283 [2] Park C H,Sim K B.Emotion Recognition and Acoustic Analysis from Speech Signal // Proc of the International Joint Conference on Neural Networks.Portland,USA,2003,IV: 2594-2598 [3] Banziger T,Scherer K R.The Role of Intonation in Emotional Expressions.Speech Communication,2005,46(3/4): 252-267 [4] Mao Xia,Chen Lijiang,Fu Liqin.Multi-Level Speech Emotion Recognition Based on HMM and ANN // Proc of the WRI World Congress on Computer Science and Information Engineering.Los Angeles,USA,2009,VII: 225-229 [5] Schuller B,Rigoll G,Lang M.Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Montreal,Canada,2004,I: 577-580 [6] Mao Xia,Chen Lijiang.Speech Emotion Recognition Based on Parametric Filter and Fractal Dimension.IEICE Trans on Information and Systems,2010,93(8): 2324-2326 [7] Liu Jia,Chen Chun,Ye Chengxi,et al.Speech Emotion Recognition Based on Covariance Descriptor and Riemannian Manifold.Pattern Recognition and Artificial Intelligence,2009,22(5): 673-677 (in Chinese) (刘 佳,陈 纯,叶承羲,等.基于协方差描述子和黎曼流形的语音情感识别.模式识别与人工智能,2009,22(5): 673-677) [8] Tomkins S S.Affect,Imagery,Consciousness.New York,USA: Springer,1962 [9] Ekman P.An Argument for Basic Emotions.Cognition and Emotion,1992,6(3/4): 169-200 [10] Izard C E.Human Emotions.New York,USA: Plenum Press,1977 [11] Wundt W M.Grundzuge der Physiologischen Psychologie.5th Edition.Leipzig,Germany: Engelmann,1874 [12] Guo Yuefei,Shu Tingting,Yang Jingyu,et al.Feature Extraction Method Based on the Generalised Fisher Discriminant Criterion and Facial Recognition.Pattern Analysis Applications,2001,4(1): 61-66 [13] Wang S,Li Deyu,Wei Yingjie,et al.A Feature Selection Method Based on Fisher’s Discriminant Ratio for Text Sentiment Classification // Proc of the International Conference on Web Information Systems and Mining.Shanghai,China,2009: 88-97 [14] Lin Yilin,Wei Gang.Speech Emotion Recognition Based on HMM and SVM // Proc of the 4th International Conference on Machine Learning and Cybernetics.Guangzhou,China,2005,VIII: 4898-4901 [15] Lu Tao.Speech Emotion Recognition Based on SVM.Master Dissertation.Yanshan,China: Yanshan University,2007 (in Chinese) (芦 涛.基于SVM的汉语语音情感识别研究.硕士学位论文.燕山:燕山大学,2007) [16] Cortes C,Vapnik V.Support-Vector Networks.Machine Learning,1995,20(3): 273-297 [17] Mao Xia,Chen L J,Zhang Bing.Mandarin Speech Emotion Recognition Based on a Hybrid of HMM/ANN.International Journal of Computers,2007,4(1): 321-324 [18] Burkhardt F,Paeschke A,Rolfes M,et al.A Database of German Emotional Speech // Proc of Interspeech.Lissabon,Portugal,2005: 1517-1520