Abstract:A voice conversion method based on speaker independent (SI) model is proposed. Considering the phoneme information that commonly exists in every speakers speech,an SI space described only by the phoneme information is assumed to exist. Gaussian mixture model (GMM) is adopted to model the distribution of the SI space,and the mapping relations from speaker dependent (SD) space to SI space are described by linear transformations. The SI model is trained by using speaker adaptive training (SAT) algorithm on a multi-speaker database. In the conversion phase,the conversion functionfromsource space to target space is quickly and flexibly built by joining the transformations from source space to SI space and SI space to target space. The advantage of the proposed method is proved by the results of some listening tests compared with two representative conventional methods.
[1] Abe M,Nakamura S,Shikano K,et al. Voice Conversion through Vector Quantization // Proc of the International Conference on Acoustic,Speech and Signal Processing. New York,USA,1988: 655-658 [2] Stylianou Y,Cappe O,Moulines E. Continuous Probabilistic Transform for Voice Conversion. IEEE Trans on Speech and Audio Processing,1998,6(2): 131-142 [3] Kain A,Macon M W. Spectral Voice Conversion for Text-to-Speech Synthesis // Proc of the International Conference on Acoustic,Speech and Signal Processing. Seattle,USA, 1998: 285-288 [4] Ye H,Young S. Quality-Enhanced Voice Morphing Using Maximum Likelihood Transformations. IEEE Trans on Audio,Speech and Language Processing, 2006,14(4): 1301- 1312 [5] Toda T,Black A W,Tokuda K. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Trans on Audio,Speech and Language Processing,2007,15(8): 2222-2235 [6] Ohtani Y,Toda T,Saruwatari H,et al. Many-to-Many Eigenvoice Conversion with Reference Voice // Proc of Interspeech. Brighton,UK,2009: 1623-1626 [7] Hershey J R,Olsen P A. Approximating the Kullback-Leibler Divergence between Gaussian Mixture Models // Proc of the International Conference on Acoustics,Speech and Signal Processing. Hawaii,USA,2007,IV: 317-320 [8] Reynolds D A,Quatieri T F,Dunn R B. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing,2000,10(1/2/3): 19-41 [9] Gales M J F. Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition. Computer Speech and Language,1998,12(2): 75-98