Abstract:Since the change of emotion state is continuous in daily conversations, an emotion reasoning algorithm based on emotional context for speech emotion recognition (SER) is put forward. In this algorithm, contextual speech emotion features and widely-used acoustic speech emotion features are used to recognize emotion state in continuous speech utterances respectively. Then, the emotional interaction matrix and the confidence coefficient are used to fuse the recognition results of these two kinds of features.Finally, the emotion reasoning rule based on the emotional context is proposed to adjust the fusion results according to the emotional context of the emotional utterance to be analyzed. The fusion results after adjusting are used as the emotion state of the emotional utterance to be analyzed.The experimental results on the recorded emotional speech corpus with respect to 6 basic emotion states show that the proposed algorithm can improve the emotion recognition accuracies of the continuous speech, and compared with the method by widely-used acoustic speech emotional features, the average recognition accuracy of the proposed algorithm rises by 12.17%.
毛启容,白李娟,王丽,詹永照. 基于情感上下文的语音情感推理算法*[J]. 模式识别与人工智能, 2014, 27(9): 826-834.
MAO Qi-Rong, BAI Li-Juan, WANG Li, ZHAN Yong-Zhao. Emotion Reasoning Algorithm Based on Emotional Context of Speech. , 2014, 27(9): 826-834.
[1] Tawari A, Trivedi M M. Speech Emotion Analysis: Exploring the Role of Context. IEEE Trans on Multimedia, 2010, 12(6): 502-509 [2] Wang K C. A Novel Approach Based on Adaptive Long-Term Sub-Band Entropy and Multi-thresholding Scheme for Detecting Speech Signal. IEICE Transactions on Information and Systems, 2012, E95-D(11): 2732-2736 [3] Calvo R A, D′Mello S. Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications. IEEE Trans on Affective Computing, 2010, 1(1): 18-37 [4] Koolagudi S G, Rao K S. Emotion Recognition from Speech: A Review. International Journal of Speech Technology, 2012, 15(2): 99-117 [5] Shinde S, Pande S. A Survey on Emotion Recognition with Respect to Database and Various Recognition Techniques. International Journal of Computer Applications, 2012, 58(3): 9-12 [6] Zeng Z H, Pantic M, Roisman G I, et al. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, 31(1): 39-58 [7] Yeh J H, Pao T L, Lin C Y, et al. Segment-Based Emotion Recognition from Continuous Mandarin Chinese Speech. Computers in Human Behavior, 2011, 27(5): 1545-1552 [8] Lu Z M, Jin H, Zhang C X, et al. Voice Activity Detection in Complex Environment Based on Hilbert-Huang Transform and Order Statistics Filter. Journal of Electronics & Information Technology, 2012, 34(1): 213-217 (in Chinese) (卢志茂,金 辉,张春祥,等.基于HHT和OSF的复杂环境语音端点检测.电子与信息学报, 2012, 34(1): 213-217) [9] Moattar M H, Homayounpour M M, Kalantari N K. A New Approach for Robust Realtime Voice Activity Detection Using Spectral Pattern // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing. Dallas, USA, 2010: 4478-4481 [10] Liu H P, Li X, Zheng Y, et al. Speech Endpoint Detection Based on Improved Adaptive Band-Partitioning Spectral Entropy. Journal of System Simulation, 2008, 20(5): 1366-1371 (in Chinese) (刘华平,李 昕,郑 宇,等.一种改进的自适应子带谱熵语音端点检测方法.系统仿真学报, 2008, 20(5): 1366-1371) [11] Ayadi M E, Kamel M S, Karray F. Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases. Pattern Recognition, 2011, 44(3): 572-587 [12] Bozkurt E, Erzin E, Erdem E, et al. Formant Position Based Weighted Spectral Features for Emotion Recognition. Speech Communication, 2011, 53(9/10): 1186-1197 [13] Zavaschi T H H, Britto Jr A S, Oliveira L E S, et al. Fusion of Feature Sets and Classifiers for Facial Expression Recognition. Expert Systems with Applications, 2013, 40(2): 646-655 [14] Kabir M M, Shahjahan M, Murase K. A New Hybrid Ant Colony Optimization Algorithm for Feature Selection. Expert System with Applications, 2012, 39(3): 3747-3763 [15] Wei S K, Zhao Y, Zhu Z F. Video Ranking with Multi-evidenc Combination. Acta Electronica Sinica, 2010, 38(1): 167-172,166 (in Chinese) (韦世奎,赵 耀,朱振峰.基于多证据融合的视频排序方法.电子学报, 2010, 38(1): 167-171,166) [16] Gupta L, Chung B, Srinath M D, et al. Multichannel Fusion Models for the Parametric Classification of Differential Brain Activity. IEEE Trans on Biomedical Engineering, 2005, 52 (11) : 1869-1881 [17] Liscombe J, Riccardi G, Hakkani-Tür D. Using Context to Improve Emotion Detection in Spoken Dialog Systems // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 1845-1848 [18] Guo Y J, Liu G, Liu J, et al. Environmental Features Based Confidence Measure for Speech Recognition. Journal of Tsinghua University: Science & Technology, 2009, 49(S1): 1388-1392 (in Chinese) (国玉晶,刘 刚,刘 健,等.基于环境特征的语音识别置信度研究.清华大学学报:自然科学版, 2009, 49(S1): 1388-1392) [19] Metallinou A, Wllmer M, Katsamanis A, et al. Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification. IEEE Trans on Affective Computing, 2012, 3(2): 184-198 [20] Chen C, You M Y, Song M L, et al. An Enhanced Speech Emotion Recognition System Based on Discourse Information // Proc of the 6th International Conference on Computational Science. Reading, UK, 2006: 449-456 [21] Wang X J. Research on Feature Selection Methods for Speech Emotion Recognition. Master Dissertation. Zhenjiang, China: Jiangsu University, 2007 (in Chinese) (王小佳.基于特征选择的语音情感识别研究.硕士学位论文,镇江:江苏大学, 2007) [22] Bai L J, Zhao X L, Mao Q R, et al. Speech Emotion Feature Based on Acoustic Context Extraction and Analysis. Journal of Chinese Computer Systems, 2013, 34(6): 1451-1456 (白李娟,赵小蕾,毛启容,等.基于声学上下文的语音情感特征提取与分析.小型微型计算机系统, 2013, 34(6): 1451-1456