Abstract:The voice conversion system framework is introduced in this paper. Further, the conventional codebook mapping method for voice conversion is discussed. This paper point out that the conventional codebook mapping method, which calculates the weighting coefficients based on whole codebooks, tends to generate overly smoothed effect on converted speech spectrum. So the converted speech quality is decreased greatly. To address this problem, a novel voice conversion method based on codebook mapping with phonemetied weighting is presented. And a new decision tree based prosodic conversion method is also proposed. The experiments show that the proposed methods can effectively convert speaker's individuality while maintaining high speech quality with only a small amount of training data.
王子祥,戴礼荣,王玉平,王仁华. 基于音素绑定码本映射的说话人声音转换方法[J]. 模式识别与人工智能, 2006, 19(3): 300-306.
WANG ZiXiang, DAI LiRong, WANG YuPing, WANG RenHua. A Novel Voice Conversion Method Based on Codebook Mapping with PhonemeTied Weighting. , 2006, 19(3): 300-306.
[1] Abe M, Nakamura S, Shikano K, Kuwabara H. Voice Conversion through Vector Quantization. In: Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. New York, USA, 1988, Ⅰ: 655-658 [2] Narendranath M, Murthy H A, Rajendran S, Yegnanarayana B. Transformation of Formants for Voice Conversion Using Artificial Neural Networks. Speech Communication, 1995, 16(2): 207-216 [3] Mizuno H, Abe M. Voice Conversion Algorithm Based on Piecewise Linear Conversion Rules of Formant Frequency and Spectrum Tilt. Speech Communication, 1995, 16(2): 153-164 [4] Stylianou Y, Cappe O, Moulines E. Continuous Probabilistic Transform for Voice Conversion. IEEE Trans on Speech and Audio Processing, 1998, 6(2): 131-142 [5] Wang Z X, Wang R H, Shuang Z W, Ling Z H. A Novel Voice Conversion System Based on Codebook Mapping with Phoneme-tied Weighting. In: Proc of the 8th Intenational Conference on Spoken Language Processing. Jeju Island, Korea, 2004, 1197-1200 [6] Kawahara H. Restructuring Speech Representations Using a Pitch-Adaptive Time Frequency Smoothing and a Instantaneous-Frequency-Based F0 Extraction: Possible Role of a Repetitive Structure in Sound. Speech Communication, 1999, 27(3-4): 187-207 [7] Arslan L M. Speaker Transformation Algorithm Using Segmental Codebooks (STASC). Speech Communication, 1999, 28(3): 211-226 [8] Turk O, Arslan L M. Subband Based Voice Conversion. In: Proc of the International Conference on Spoken Language Processing. Denver, USA, 2002, Ⅰ: 289-292 [9] Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. New York, USA: Chapman and Hall, 1984 [10] Hasan M M, Nasr A M, Sultana S. An Approach to Voice Conversion Using Feature Statistical Mapping. Applied Acoustics, 2005, 66(5): 513-532