|
|
A PCA Method Based on Speaker Session Variability |
LONG Yan-Hua, GUO Wu, DAI Li-Rong |
iFly Speech Laboratory, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Heifei 230027 |
|
|
Abstract In the text-independent speaker verification systems, the mismatch and variability of the channel and environment between training and testing is a session variability problem. It can greatly degrade the speaker recognition performance. To deal with the problem more efficiently, a modified PCA method is proposed called session variation principal component analysis (SVPCA) which can integrate with within class covariance normalization (WCCN). In the NIST 2006 verification task, the proposed method is compared with our previous baseline general linear discriminative sequence-support vector machine (GLDS-SVM) system. The experimental results show a relative reduction of up to 24.2% in error equal ratio (EER). Moreover, the proposed method has advantages in computational and memory costs, compared with the state-of-art systems.
|
Received: 08 October 2007
|
|
|
|
|
[1] Sturim D E, Campbell W M, Reynolds D A, et al. Robust Speaker Recognition with Cross-Channel Data: MIT-LL Results on the 2006 NIST SRE Auxiliary Microphone Task // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 49-52 [2] Kenny P, Boulianne G, Ouellet P, et al. Joint Factor Analysis versus Eigenchannels in Speaker Recognition. IEEE Trans on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447 [3] Solomonoff A, Quillen C, Campbell W. Channel Compensation for SVM Speaker Recognition [EB/OL]. [2004- 12- 01]. http://www.ll.mit.edu.mission/communications/ist/publications/040531_Solomonoff.pdf [4] Vapnik V N. The Nature of Statistical Learning Theory. New York, USA: Springer-Verlag, 1995 [5] Hatch A, Stolcke A. Generalized Linear Kernels for One-Versus-All Classification: Application to Speaker Recognition // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, Ⅴ: 585-588 [6] Hatch A O, Kajarekar S, Stolcke A. Within-Class Covariance Normalization for SVM-Based Speaker Recognition [EB/OL]. [2007- 10- 21]. http://www.icsi.berkeley.edu/pubs/speech/HatchICSLP06.pdf [7] Matejka P, Burget L, Schwarz P, et al. STBU System for the NIST 2006 Speaker Recognition Evaluation // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, Ⅳ: 221-224 [8] Campbell W M, Campbell J P, Reynolds D A, et al. Support Vector Machines for Speaker and Language Recognition. Computer Speech and Language, 2006, 20(2/3): 210-229 [9] Nation Institute of Standards and Technology. NIST Speech Group Website [DB/OL]. [2007- 10- 03]. http://www.nist.gov/speech |
|
|
|