模式识别与人工智能
Thursday, Apr. 10, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2010, Vol. 23 Issue (4): 572-579    DOI:
Orignal Article Current Issue| Next Issue| Archive| Adv Search |
Minimum Generation Error Training Based on Perceptually Weighted Line Spectral Pair Distance for Statistical Parametric Speech Synthesis
LEI Ming,LING Zhen-Hua,DAI Li-Rong
iFly Speech Laboratory,Department of Electronic Engineering and Information Science,University of Science and Technology of China,Hefei 230027

Download: PDF (608 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  A Minimum Generation Error (MGE) training method based on perceptually weighted Line Spectral Pair (LSP) distance is proposed to improve the performance of Hidden Markov Model (HMM) based parametric speech synthesis system. The generation error defined by Euclidean distance used in the traditional MGE training, is not eligible in measuring the real gap between generated spectrum and natural spectrum when the speech spectrum is described by LSP. Although using generation error defined by Log Spectral Distortion (LSD) having nothing to do with spectrum parameters manages to deal with this problem, the improvement seems trivial compared to the incurred higher computational complexity. In this paper, an MGE training criterion based on weighted LSP distance is proposed, and this MGE training method is subjectively and objectively contrasted with different weighted methods and LSD based MGE training method. Eventually, a perceptually weighted training method is obtained, which not only achieves the best performance, but also incurs no extra computational complexity compared with the traditional MGE training.
Key wordsSpeech Synthesis      Hidden Markov Model (HMM)      Minimum Generation Error (MGE)      Perceptually Weighting      Line Spectral Pair Parameter     
Received: 07 February 2009     
ZTFLH: TN912.33  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
LEI Ming
LING Zhen-Hua
DAI Li-Rong
Cite this article:   
LEI Ming,LING Zhen-Hua,DAI Li-Rong. Minimum Generation Error Training Based on Perceptually Weighted Line Spectral Pair Distance for Statistical Parametric Speech Synthesis[J]. , 2010, 23(4): 572-579.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2010/V23/I4/572
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn