Abstract:To solve the problem that the recognition rate of traditional Gaussian mixture model decreases significantly in noisy conditions, a Gaussian mixture model based on α variable factor-integration is presented by adopting the α-integration mechanism of multiple stochastic models in the form of probability distributions. Through introducing the variable factor, the proportion of different compositions in the mixture model is adjusted again. By re-estimating the proposed model parameters, the experimental results show the performance of the proposed model is better than that of the traditional Gaussian mixture model on databases TIMIT/NTIMIT and different speaker numbers. Especially in noisy conditions with the optimal value of α,the recognition rate is increased by 8%. On NIST evaluation database the experimental results show that the recognition rate is increased as well compared with GMM-UBM system.
李杰,刘贺平. 用于说话人识别的基于可变因子整合的高斯混合模型[J]. 模式识别与人工智能, 2012, 25(6): 937-942.
LI Jie,LIU He-Ping. Gaussian Mixture Model Based on Variable Factor-Integration for Speaker Recognition. , 2012, 25(6): 937-942.
[1] Reynolds D A,Rose R C.Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models.IEEE Trans on Speech and Audio Processing,1995,3(1): 72-83 [2] Kenny P,Ouellet P,Dehak N.A Study of Inter-Speaker Variability in Speaker Verification.IEEE Trans on Audio,Speech and Language Processing,2008,16 (5): 980-988 [3] Muller F,Mertins A.Nonlinear Translation-Invariant Transformation for Speaker-Independent Speech Recognition.Advances in Nonlinear Speech Processing,2010,15(7): 111-119 [4] Falthhauser R,Ruske G.Improving Speaker Recognition Performance Using Phonetically Structured Gaussian Mixture Models // Proc of the 7th European Conference on Speech Communication and Technology.Aalborg,Denmark,2001: 751-754 [5] Wu D,Jiang H.Normalization and Transformation Techniques for Robust Speaker Recognition.Speech Recognition,2008,11(5): 1-21 [6] Marriott P.On the Local Geometry of Mixture Models.Biometrika,2002,89(1): 79-93 [7] Xu L.Advances on BYY Harmony Learning: Information Theoretic Perspective,Generalized Projection Geometry,and Independent Factor Autodetermination.IEEE Trans on Neural Networks,2004,15(4): 885-902 [8] Amari S.Integration of Stochastic Models by Minimizing α-Divergence.Neural Computation,2007,19(10): 2780-2796 [9] Chait M,Poeppel D,Simon J Z.Neural Response Correlates of Detection of Monaurally and Binaurally Created Pitches in Humans.Cerebral Cortex,2006,16(6): 835-848 [10] Amari S,Nagaoka H.Methods of Information Geometry.Oxford,UK: Oxford University Press,2000 [11] Jiang H.A General Formulation for Discriminative Learning of Generative Graphical Models.New York,USA: New York University Press,2007 [12] Bilmes L A.A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.Berkeley,USA: University of California Press,1997 [13] Wu D.Discriminative Preprocessing of Speech: Towards Improving Biometric Authentication.New York,USA: Springer,2006 [14] Reynolds D A,Quatieri T F,Dunn R.Speaker Verification Using Adapted Gaussian Mixture Models.Digital Signal Processing,2000,10(3): 19-41 [15] Choi H,Katake A,Choe K.Alpha-Integration of Multiple Evidence // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Dallas,USA,2010: 2210-2213 [16] Heeyoul C,Seungjin C,Yoonsunk C.Learning Alpha-Integration with Partially-Labeled Data // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing. Dallas,USA,2010: 2058-2061 [17] Lanckrei G,Deng M,Cristianimi N.Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast // Proc of the Pacific Symposium on Biocomputation.Hawaii,USA,2004: 300-311