改进的跨语种语音合成模型自适应方法

摘要
图/表
参考文献
相关文章 (7)

全文: PDF (524 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要统计参数语音合成中的跨语种模型自适应主要应用于目标说话人语种与源模型语种不同时，使用目标发音人少量语音数据快速构建具有其音色特征的源模型语种合成系统。本文对传统的基于音素映射和三音素模型的跨语种自适应方法进行改进，一方面通过结合数据挑选的音素映射方法以提高音素映射的可靠性，另一方面引入跨语种的韵律信息映射以弥补原有方法中三音素模型在韵律表征上的不足。在中英文跨语种模型自适应系统上的实验结果表明，改进后系统合成语音的自然度与相似度相对传统方法都有了明显提升。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	刘航
	凌震华
	郭武
	戴礼荣

关键词 ：隐马尔科夫模型(HMM), 语音合成, 跨语种模型自适应, 音素映射

Abstract：Cross-language model adaptation in statistical parametric speech synthesis is used for rapidly constructing a text-to-speech (TTS) system with the target speakers characteristics when the source and the target speakers languages are different. In this paper, the conventional cross-language adaptation method based on phone-mapping and triphone models is improved by two means. Firstly, phone mapping combined with data-selection is adopted to improve its reliability. Secondly, cross-language prosodic information mapping is introduced to make use of prosodic information, which is ignored in the triphone model. Experiments on Chinese-to-English adaptation show that the synthesized speech using the improved method has much better naturalness and speaker similarity compared with the result of conventional method.

Key words： Hidden Markov Model (HMM) Speech Synthesis Cross-Language Model Adaptation Phone Mapping

收稿日期: 2010-06-02

ZTFLH:

TN912.34

基金资助:中央高校基本科研业务费专项资金资助项目

作者简介: 刘航，男，1983年生，硕士研究生，主要研究方向为语音合成，说话人自适应。E-mail:lhang@mail。ustc。edu。cn。凌震华，男，1979年生，博士后，主要研究方向为语音合成。郭武，男，1973年生，博士，讲师，主要研究方向为说话人与语种识别。戴礼荣，男，1962年生，教授，博士生导师，主要研究方向为语音合成、语音识别、语种识别、说话人识别、数字信号处理。E-mail:lrdai@ustc。edu。cn。

引用本文:

刘航，凌震华，郭武，戴礼荣. 改进的跨语种语音合成模型自适应方法[J]. 模式识别与人工智能, 2011, 24(4): 457-463. LIU Hang, LING Zhen-Hua, GUO Wu, DAI Li-Rong. An Improved Cross-Language Model Adaptation Method for Speech Synthesis. , 2011, 24(4): 457-463.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2011/V24/I4/457

[1] Tokuda K,Zen H,Black A W.HMM-Based Approach to Multilingual Speech Synthesis // Narayanan S,Alwan A,eds.Text to Speech Synthesis: New Paradigms and Advances.Upper Saddle River,USA: Prentice-Hall,2004: 135-153
[2] Leggetter C J,Woodland P C.Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models.Computer Speech and Language,1995,9(2): 171-185
[3]Latorre J,Iwano K,Furui S.New Approach to the Polyglot Speech Generation by Means of an HMM-Based Speaker Adaptable Synthesizer.Speech Communication,2006: 48(10): 1227-1242
[4] Wu Y,Nankaku Y,Tokuda K.State Mapping Based Method for Cross-Lingual Speaker Adaptation in HMM-Based Speech Synthesis // Proc of the 10th Annual Conference of the International Speech Communication Association.Brighton,UK,2009: 528-531
[5] Gibson M,Hirsimaki T,Karhila R,et al.Unsupervised Cross-Lingual Speaker Adaptation for HMM-Based speech Synthesis Using Two-Pass Decision Tree Construction // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing.Dallas,USA,2010: 4641-4645
[6] Wu Y,King S,Tokuda K.Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis // Proc of the International Symposium on Chinese Spoken Language.Kunming,China,2008: 9-12
[7] Gales M J F.The Generation and Use of Regression Class Trees for MLLR Adaptation.Technical Report,CUED/F-INFENG/TR263.Engineering Department,Cambridge University.Cambridge,UK,1996
[8] International Phonetic Association.Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet.London,UK:Cambridge University Press,1999
[9] Kawahara H,Masuda-Katsuse I,deCheveigne A.Restructuring Speech Representations Using A Pitch-Adaptive Time-Frequency Smoothing and an Instanta-Neous-Frequency-Based F0 Extraction: Possible Role of a Repetitive Structure in Sounds.Speech Communication,1999,27(3/4): 187-207