模式识别与人工智能
2025年4月4日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2010, Vol. 23 Issue (4): 572-579    DOI:
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于感知加权线谱对距离的最小生成误差语音合成模型训练方法
雷鸣,凌震华,戴礼荣
中国科学技术大学 电子工程与信息科学系 讯飞语音实验室 合肥 230027
Minimum Generation Error Training Based on Perceptually Weighted Line Spectral Pair Distance for Statistical Parametric Speech Synthesis
LEI Ming,LING Zhen-Hua,DAI Li-Rong
iFly Speech Laboratory,Department of Electronic Engineering and Information Science,University of Science and Technology of China,Hefei 230027

全文: PDF (608 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 提出一种基于感知加权线谱对(Line Spectral Pair,LSP)距离的最小生成误差(Minimum Generation Error, MGE)模型训练方法,用以改善基于隐马尔科夫模型的参数语音合成系统性能。在采用线谱对参数表征语音频谱特征时,传统MGE训练中使用的欧氏距离生成误差计算方法并不能较好地反映生成频谱与自然频谱之间的真实距离,而采用与谱参数无关的对数谱间距(Log Spectral Distortion, LSD)定义的生成误差函数可改善这一问题,但改进后主观效果不明显,且运算复杂度很高。文中先提出基于加权LSP距离的MGE模型训练方法,并在实验中从主客观对比不同加权方法以及基于LSD的MGE训练。最后,找到一种感知加权方法,不但具有较好的主观表现,而且在运算复杂度上与传统MGE训练相比几乎没有增加。
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
雷鸣
凌震华
戴礼荣
关键词 语音合成隐可尔科夫模型(HMM)最小生成误差(MGE)感知加权线谱对参数    
Abstract:A Minimum Generation Error (MGE) training method based on perceptually weighted Line Spectral Pair (LSP) distance is proposed to improve the performance of Hidden Markov Model (HMM) based parametric speech synthesis system. The generation error defined by Euclidean distance used in the traditional MGE training, is not eligible in measuring the real gap between generated spectrum and natural spectrum when the speech spectrum is described by LSP. Although using generation error defined by Log Spectral Distortion (LSD) having nothing to do with spectrum parameters manages to deal with this problem, the improvement seems trivial compared to the incurred higher computational complexity. In this paper, an MGE training criterion based on weighted LSP distance is proposed, and this MGE training method is subjectively and objectively contrasted with different weighted methods and LSD based MGE training method. Eventually, a perceptually weighted training method is obtained, which not only achieves the best performance, but also incurs no extra computational complexity compared with the traditional MGE training.
Key wordsSpeech Synthesis    Hidden Markov Model (HMM)    Minimum Generation Error (MGE)    Perceptually Weighting    Line Spectral Pair Parameter   
收稿日期: 2009-02-07     
ZTFLH: TN912.33  
作者简介: 雷鸣,男,1985年生,博士,主要研究方向为语音合成.E-mail:leiming@mail.ustc.edu.cn.凌震华,男,1979年生,博士后,主要研究方向为语音合成.戴礼荣,男,1962年生,教授,博士生导师,主要研究方向为语音合成、语音识别、语种识别、说话人识别、数字信号处理.
引用本文:   
雷鸣,凌震华,戴礼荣. 基于感知加权线谱对距离的最小生成误差语音合成模型训练方法[J]. 模式识别与人工智能, 2010, 23(4): 572-579. LEI Ming,LING Zhen-Hua,DAI Li-Rong. Minimum Generation Error Training Based on Perceptually Weighted Line Spectral Pair Distance for Statistical Parametric Speech Synthesis. , 2010, 23(4): 572-579.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2010/V23/I4/572
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn