|
|
Speech Emotion Recognition Based on Covariance Descriptor and Riemannian Manifold |
LIU Jia, CHEN Chun, YE Cheng-Xi, LI Na, BU Jia-Jun |
Key Laboratory of Service Robot Technique, College of Computer Science and Technology, Zhejiang University, Hangzhou 310027 |
|
|
Abstract An algorithm for speech emotion recognition is proposed based on covariance descriptor and Riemannian manifold. According to the extracted acoustic features, covariance matrices are computed as the emotion descriptors of sentences. With the consideration of high dimensional characteristic of the space constructed by non-singular covariance matrices, an affine invariance metric is adopted to make the space meet the requirement of Riemannian manifold. With differential geometry, the speech emotion recognition is performed on the manifold. The experimental results show a significant improvement in recognition accuracy, especially under noisy environments.
|
Received: 27 October 2008
|
|
|
|
|
[1] Lin Yilin, Wei Gang, Yang Kangcai. A Survey of Emotion Recognition in Speech. Journal of Circuits and Systems, 2007, 12(1): 90-98 (in Chinese) (林奕琳,韦 岗,杨康才.语音情感识别的研究进展.电路与系统学报, 2007, 12(1): 90-98) [2] Wang Zhiliang, Chen Fengjun, Xue Weimin. A Survey of Facial Expression Recognition. Computer Applications and Software, 2003, 20(12): 63-66 (in Chinese) (王志良,陈锋军,薛为民.人脸表情识别方法综述.计算机应用与软件, 2003, 20(12): 63-66) [3] Liu Dan, Zhang Naiyao, Zhu Hancheng. A CAD System of Music Animation Based on Form and Mood Recognition. Pattern Recognition and Artificial Intelligence, 2003, 16(3): 283-287 (in Chinese) (刘 丹,张乃尧,朱汉城.基于曲式和情感识别的音乐动画CAD系统.模式识别与人工智能, 2003, 16(3): 283-287) [4] Zhao Li, Jiang Chunhui, Zou Cairong, et al. A Study on Emotional Feature Analysis and Recognition in Speech. Acta Electronica Sinica, 2004, 32(4): 606-609 (in Chinese) (赵 力,蒋春辉,邹采荣,等.语音信号中的情感特征分析和识别的研究.电子学报, 2004, 32(4): 606-609) [5] Pao T L, Chen Y T, Yeh J H. Emotion Recognition from Mandarin Speech Signals // Proc of the International Symposium on Chinese Spoken Language Processing. Hongkong, China, 2004: 301-304 [6] Nicholson J, Takahashi K, Nakatsu R. Emotion Recognition in Speech Using Neural Networks. Neural Computing and Applications, 2000, 2(2): 495-501 [7] Yu Feng, Chang E, Xu Yingqing, et al. Emotion Detection from Speech to Enrich Multimedia Content // Proc of the IEEE Pacific Rim Conference on Multimedia. Beijing, China, 2001: 550-557 [8] Tuzel O, Porikli F, Meer P. Region Covariance: A Fast Descriptor for Detection and Classification // Proc of the 9th European Conference on Computer Vision. Graz, Austria, 2006, Ⅱ: 589-600 [9] Fletcher P T, Joshi S. Riemannian Geometry for the Statistical Analysis of Diffusion Tensor Data. Signal Processing, 2007, 87(2): 250-262 [10] Pennec X, Fillard P, Ayache N. A Riemannian Framework for Tensor Computing. International Journal of Computer Vision, 2006, 66(1): 41-66 |
|
|
|