基于PAD情感模型的可训练语音合成研究

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (468 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要情感语音合成是情感计算和语音信号处理研究的热点之一,进行准确的语音情感分析是合成高质量情感语音的前提.文中采用PAD情感模型作为情感分析量化模型,对情感语料库中的语音进行情感分析和聚类,获得各情感PAD参数模型.由HMM语音合成系统合成的情感语音,通过PAD模型进行参数修正,使得合成语音的情感参数更加准确,从而提高情感语音合成的质量.实验表明该方法能较好地提高合成语音的自然度和情感清晰度,在同性别不同说话人中也能达到较好的性能.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈雁翔
	龙润田

关键词 ： PAD情感模型, 可训练语音合成, 情感量化, 参数修正, 情感特征

Abstract：Emotional speech synthesis is the emphasis and hotspot in affective computing and speech signal processing. In speech synthesis,accurate speech emotion analysis is a prerequisite for high-quality synthesis of emotional speech. In this paper,PAD emotional model is used to build a 3D emotional space for sentiment analysis and clustering of emotional corpus of speech in order to get emotional PAD parameters model. The emotional speech is synthesized by HMM speech synthesis system,and the parameters of synthesized speech emotion are modified by PAD model. Therefore,the quality of emotional speech synthesis is improved. The experimental results show that the proposed method improves the naturalness of synthesized speech and the clarity of emotion and also achieves good performance among different male speakers.

Key words： PAD Emotional model Trainable Speech Synthesis Emotional Quantification Parameter Calibration Emotional Characteristic

收稿日期: 2012-10-16

ZTFLH:

TN912.33

基金资助:国家自然科学基金项目(No.61105076 )、第51批中国博士后科学基金项目(No.2012M511402)、安徽省自然科学基金项目(No.11040606M127)、安徽省语音产业科技创新专项项目(No.11010202192)资助

作者简介: 陈雁翔(通讯作者),女,1972年生,博士,副教授,主要研究方向为情感计算、语音信号处理.E-mail:Chenyx@hfut.edu.cn.龙润田,男,1991年生,硕士研究生,主要研究方向为语音信号处理.

引用本文:

陈雁翔，龙润田. 基于PAD情感模型的可训练语音合成研究[J]. 模式识别与人工智能, 2013, 26(11): 1019-1025. CHEN Yan-Xiang,LONG Run-Tian. Trainable Emotional Speech Synthesis Based on PAD. , 2013, 26(11): 1019-1025.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2013/V26/I11/1019

[1] Xu Linhong, Lin Hongfei, Zhao Jing. Construction and Analysis of Emotional Corpus. Journal of Chinese Information Processing, 2008, 22(1): 116-122 (in Chinese)
(徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析.中文信息学报, 2008, 22(1): 116-122)
[2] Bulut M, Lee S, Narayanan S. A Statistical Approach for Modeling Prosody Features Using POS Tags for Emotional Speech Synthesis // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing. Honolulu, USA, 2007, IV: 1237-1240
[3] Tao Jianhua, Kang Yongguo, Li Aijun. Prosody Conversion from Neutral Speech to Emotional Speech.IEEE Trans on Audio, Speech, and Language Processing, 2006, 14(4): 1145-1154
[4] Liu Zhen, Jing Xinxing. Research of Chinese Emotional Speech Synthesis. Science Technology Information, 2008, (9): 78-85 (in Chinese)
(刘震,景新幸.汉语情感语音合成的研究.科技信息, 2008, (9): 78-85)
[5] Chen Jie, Zhang Xueying, Sun Ying. Study for HMM-Based Trainable Emotional Speech Synthesis. Audio Engineering, 2012, 36(3): 43-46 (in Chinese)
(陈洁,张雪英,孙颖.基于HMM的可训练情感语音合成研究.电声技术, 2012, 36(3): 43-46)
[6] Darwin C. The Expression of the Emotions in Man and Animals. London, UK: John Murray, 1872
[7] Ekman P.Facial Expression and Emotion. American Psychologist, 1993, 48(4): 384-392
[8] Russell J A, Bachorowski J A, Fernández-Dols J M. Facial and Vocal Expressions of Emotion. Annual Review of Psychology, 2003. DOI:10.1146/annurev.psych.54.101601.145102
[9] Mehrabian A. Pleasure-Arousal-Dominance: A General Framework for Describing and Measuring Individual Differences in Temperament. Current Psychology, 1996, 14(4): 261-292
[10] Wu Yijian, Wang Renhua. HMM-Based Trainable Speech Synthesis for Chinese. Journal of Chinese Information Processing, 2006, 20(4): 75-81 (in Chinese)
(吴义坚,王仁华.基于HMM的可训练中文语音合成.中文信息学报, 2006, 20(4): 75-81)
[11] Masuko T, Tokuda K, Kobayashi T. et al. Speech Synthesis Using HMMs with Dynamic Features Acoustics // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Atlanta, USA, 1996, I: 389-392
[12] Li Xiaoming, Fu Xiaolan, Deng Guofeng. Preliminary Application of the Abbreviated PAD Emotion Scale to Chinese Undergraduate. Chinese Mental Health Journal, 2008, 22(5): 327-329 (in Chinese)
(李晓明,傅小兰,邓国峰.中文简化版PAD情绪量表在京大学生中的初步试用.中国心理卫生杂志, 2008, 22(5): 327-329)
[13] Liu Ye, Tao Linmi, Fu Xiaolan. The Analysis of PAD Emotional State Model Based on Emotion Pictures. Journal of Image and Graphics, 2009, 14(5): 753-758 (in Chinese)
(刘烨,陶霖密,傅小兰.基于情绪图片的PAD情感状态模型分析.中国图象图形学报, 2009, 14(5): 753-758)
[14] Cui Dandan. Research of Emotional Speech Analysis and Transformation. Ph.D Dissertation Beijing, China: Tsinghua University, 2007 (in Chinese)
(崔丹丹.情感语音分析与变换的研究.博士学位论文.北京:清华大学, 2007)
[15] Scherer K R, Ladd D R, Silverman K E A. Vocal Cues to Speaker Affect: Testing Two Modals. Journal of the Acoustic Society of America, 1984, 76(5): 1346-1356
[16] Ladd D R, Silverman K E A, Tolkmitt F, et al.Evidence for the Independent Function of Intonation Contour Type, Voice Quality, and F₀ Range in Signaling Speaker Affect. Journal of the Acoustical Society of America, 1985, 78(2): 435-444
[17] Zhou Hui. Emotional Speech Conversion and Recognition Based on the 3D-PAD Model. Master Dissertation. Lanzhou, China: Northwest Normal University, 2009 (in Chinese)
(周慧.基于PAD三维情绪模型的情感语音转换与识别.硕士学位论文.兰州:西北师范大学, 2009)
[18] Pereira C.Dimensions of Emotional Meaning in Speech // Proc of the ITRW on Speech and Emotion. Newcastle, UK, 2000: 25-28
[19] Jin Xuecheng. A Study on Recognition of Emotions in Speech. Ph.D Dissertation. Hefei, China: University of Science and Technology of China, 2007 (in Chinese)
(金学成.基于语音信号的情感识别研究.博士学位论文.合肥:中国科学技术大学, 2007)
[20] Sun Peihong, Tao Linmi. Emotion Measuring Method in PAD Emotional Space // Proc of the 4th Joint Conference on Harmonious Human Machine Environment. Kunming, China, 2008: 638-645 (in Chinese)
(孙佩宏,陶霖密.PAD情感空间中情感距离度量方法 // 第四届和谐人机环境联合学术会议.昆明,中国, 2008: 638-645
[21] Kawahara H. STRAIGHT, Exploitation of the Other Aspect of VOCODER: Perceptually Isomorphic Decomposition of Speech Sounds. Acoustical Science and Technology, 2006, 27(6): 349-353
[22] Cabral J P, Oliveira L C. EmoVoice: A System to Generate Emotions in Speech // Proc of the 9th International Conference on Spoken Language Processing. Pittsburgh, USA, 2006: 1798-1801
[23] Schr der M.Emotional Speech Synthesis: A Review // Proc of the 7th European Conference on Speech Communication and Technology.
Aalborg, Denmark, 2001: 561-564