1.School of Computer Information,Hefei University of Technology,Hefei 230009 2.Computer Science and Technology Postdoctoral Research Station,Hefei University of Technology,Hefei 230009 3.Institute of Linguistics,Shanghai Nonmal University,Shanghai,200234
Abstract:Emotional speech synthesis is the emphasis and hotspot in affective computing and speech signal processing. In speech synthesis,accurate speech emotion analysis is a prerequisite for high-quality synthesis of emotional speech. In this paper,PAD emotional model is used to build a 3D emotional space for sentiment analysis and clustering of emotional corpus of speech in order to get emotional PAD parameters model. The emotional speech is synthesized by HMM speech synthesis system,and the parameters of synthesized speech emotion are modified by PAD model. Therefore,the quality of emotional speech synthesis is improved. The experimental results show that the proposed method improves the naturalness of synthesized speech and the clarity of emotion and also achieves good performance among different male speakers.
[1] Xu Linhong, Lin Hongfei, Zhao Jing. Construction and Analysis of Emotional Corpus. Journal of Chinese Information Processing, 2008, 22(1): 116-122 (in Chinese) (徐琳宏,林鸿飞,赵 晶.情感语料库的构建和分析.中文信息学报, 2008, 22(1): 116-122) [2] Bulut M, Lee S, Narayanan S. A Statistical Approach for Modeling Prosody Features Using POS Tags for Emotional Speech Synthesis // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing. Honolulu, USA, 2007, IV: 1237-1240 [3] Tao Jianhua, Kang Yongguo, Li Aijun. Prosody Conversion from Neutral Speech to Emotional Speech.IEEE Trans on Audio, Speech, and Language Processing, 2006, 14(4): 1145-1154 [4] Liu Zhen, Jing Xinxing. Research of Chinese Emotional Speech Synthesis. Science Technology Information, 2008, (9): 78-85 (in Chinese) (刘 震,景新幸.汉语情感语音合成的研究.科技信息, 2008, (9): 78-85) [5] Chen Jie, Zhang Xueying, Sun Ying. Study for HMM-Based Trainable Emotional Speech Synthesis. Audio Engineering, 2012, 36(3): 43-46 (in Chinese) (陈 洁,张雪英,孙 颖.基于HMM的可训练情感语音合成研究.电声技术, 2012, 36(3): 43-46) [6] Darwin C. The Expression of the Emotions in Man and Animals. London, UK: John Murray, 1872 [7] Ekman P.Facial Expression and Emotion. American Psychologist, 1993, 48(4): 384-392 [8] Russell J A, Bachorowski J A, Fernández-Dols J M. Facial and Vocal Expressions of Emotion. Annual Review of Psychology, 2003. DOI:10.1146/annurev.psych.54.101601.145102 [9] Mehrabian A. Pleasure-Arousal-Dominance: A General Framework for Describing and Measuring Individual Differences in Temperament. Current Psychology, 1996, 14(4): 261-292 [10] Wu Yijian, Wang Renhua. HMM-Based Trainable Speech Synthesis for Chinese. Journal of Chinese Information Processing, 2006, 20(4): 75-81 (in Chinese) (吴义坚,王仁华.基于HMM的可训练中文语音合成.中文信息学报, 2006, 20(4): 75-81) [11] Masuko T, Tokuda K, Kobayashi T. et al. Speech Synthesis Using HMMs with Dynamic Features Acoustics // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Atlanta, USA, 1996, I: 389-392 [12] Li Xiaoming, Fu Xiaolan, Deng Guofeng. Preliminary Application of the Abbreviated PAD Emotion Scale to Chinese Undergraduate. Chinese Mental Health Journal, 2008, 22(5): 327-329 (in Chinese) (李晓明,傅小兰,邓国峰.中文简化版PAD情绪量表在京大学生中的初步试用.中国心理卫生杂志, 2008, 22(5): 327-329) [13] Liu Ye, Tao Linmi, Fu Xiaolan. The Analysis of PAD Emotional State Model Based on Emotion Pictures. Journal of Image and Graphics, 2009, 14(5): 753-758 (in Chinese) (刘 烨,陶霖密,傅小兰.基于情绪图片的PAD情感状态模型分析.中国图象图形学报, 2009, 14(5): 753-758) [14] Cui Dandan. Research of Emotional Speech Analysis and Transformation. Ph.D Dissertation Beijing, China: Tsinghua University, 2007 (in Chinese) (崔丹丹.情感语音分析与变换的研究.博士学位论文.北京:清华大学, 2007) [15] Scherer K R, Ladd D R, Silverman K E A. Vocal Cues to Speaker Affect: Testing Two Modals. Journal of the Acoustic Society of America, 1984, 76(5): 1346-1356 [16] Ladd D R, Silverman K E A, Tolkmitt F, et al.Evidence for the Independent Function of Intonation Contour Type, Voice Quality, and F0 Range in Signaling Speaker Affect. Journal of the Acoustical Society of America, 1985, 78(2): 435-444 [17] Zhou Hui. Emotional Speech Conversion and Recognition Based on the 3D-PAD Model. Master Dissertation. Lanzhou, China: Northwest Normal University, 2009 (in Chinese) (周 慧.基于PAD三维情绪模型的情感语音转换与识别.硕士学位论文.兰州:西北师范大学, 2009) [18] Pereira C.Dimensions of Emotional Meaning in Speech // Proc of the ITRW on Speech and Emotion. Newcastle, UK, 2000: 25-28 [19] Jin Xuecheng. A Study on Recognition of Emotions in Speech. Ph.D Dissertation. Hefei, China: University of Science and Technology of China, 2007 (in Chinese) (金学成.基于语音信号的情感识别研究.博士学位论文.合肥:中国科学技术大学, 2007) [20] Sun Peihong, Tao Linmi. Emotion Measuring Method in PAD Emotional Space // Proc of the 4th Joint Conference on Harmonious Human Machine Environment. Kunming, China, 2008: 638-645 (in Chinese) (孙佩宏,陶霖密.PAD情感空间中情感距离度量方法 // 第四届和谐人机环境联合学术会议.昆明,中国, 2008: 638-645 [21] Kawahara H. STRAIGHT, Exploitation of the Other Aspect of VOCODER: Perceptually Isomorphic Decomposition of Speech Sounds. Acoustical Science and Technology, 2006, 27(6): 349-353 [22] Cabral J P, Oliveira L C. EmoVoice: A System to Generate Emotions in Speech // Proc of the 9th International Conference on Spoken Language Processing. Pittsburgh, USA, 2006: 1798-1801 [23] Schr der M.Emotional Speech Synthesis: A Review // Proc of the 7th European Conference on Speech Communication and Technology. Aalborg, Denmark, 2001: 561-564