Abstract:A serial loading matrix training method is proposed in the factor analysis based speaker recognition. In the loading matrix training process, the eigenvoice matrix, the diagonal matrix(residual) and the channel matrix are calculated serially. In the speaker enrollment process, the above three matrixes are assembled, and then the factors are calculated through the joint factor analysis. Thus, the saturation problem in factor analysis is solved. On the NIST SRE 2006 core test corpus, the equal error rate of the proposed system is 3.65%.
[1] Reynolds D A. Channel Robust Speaker Verification via Feature Mapping // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Hongkong, China, 2003, Ⅱ: 53-56 [2] Deng Jing, Zheng T F, Wu Wenhu. Session Variability Subspace Projection Based Model Compensation for Speaker Verification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 47-50 [3] Kenny P, Ouellet P, Dehak N, et al. A Study of Inter-Speaker Variability in Speaker Verification. IEEE Trans on Audio,Speech and Language Processing, 2008, 16(5): 980-988 [4] Vogt R, Sridharan S. Experiments in Session Variability Modeling for Speaker Verification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, Ⅰ: 897-900 [5] Campbell W M, Sturim D E, Reynolds D A. Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters, 2006, 13(5): 308-311 [6] Reynolds D A, Quatieri T F, Dunn R B. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing, 2000, 10(1/2/3): 19-41 [7] Castaldo F, Colibro D, Dalmasso E, et al. Compensation of Nuisance Factors for Speaker and Language Recognition. IEEE Trans on Audio, Speech and Language Processing, 2007, 15(7): 1969-1978 [8] Kenny P, Boulianne G, Dumouchel P. Eigenvoice Modeling with Sparse Training Data. IEEE Trans on Speech and Audio Processing, 2005, 13(3): 345-354 [9] Kenny P, Boulianne G, Ouellet P, et al. Joint Factor Analysis versus Eigenchannels in Speaker Recognition. IEEE Trans on Speech and Audio Processing, 2007, 15(4): 1435-1447 [10] NIST. The NIST Year 2006 Speaker Recognition Evaluation Plan [DB/OL]. [2006-05-01]. http://www.nist.gov/speech/tests/spk/2006 [11] Mateˇjka P, Burget L, Schwarz P, et al. STBU System for the NIST 2006 Speaker Recognition Evaluation // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 221-224