Abstract:In automatic spoken language identification on telephone conversation speech, the interference caused by different signal path, different speech content or different speakers is a major factor affecting the performance. To tackle this problem, a supervector subspace analysis based automatic language identification method is proposed. In this method the supervector is first introduced to represent the training utterances. Then a discriminative training method is applied based on SVM model. Furthermore the subspace analysis is utilized to estimate the noise subspace. Finally the noise is subtracted from the distance metric. The experiments on NIST 07 30 and 10 sec evaluation task show the advantage of this method and compared with baseline system the performance is increased clearly, the EER(Equal Error Rate) being reduced relatively about 20%.
宋彦,戴礼荣,王仁华. 基于超向量子空间分析的自动语种识别方法[J]. 模式识别与人工智能, 2010, 23(2): 165-170.
SONG Yan,DAI Li-Rong,WANG Ren-Hua. An Automatic Language Identification Method Based on Supervector Subspace Analysis. , 2010, 23(2): 165-170.
[1] Torres-Carrasquillo P A, Singer E, Kchler M A, et.al. Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features // Proc of the International Conference on Spoken Language Processing. Denver, USA, 2002: 89-92 [2] Pelecanos J, Sridharan S. Feature Warping for Robust Speaker Verification // Proc of the Speaker and Language Recognition Workshop. Crete, Greece, 2001: 213-218 [3] Burget L, Matejka P, Cernocky J. Discriminative Training Techniques for Acoustic Language Identification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006: 209-212 [4] Qu Dan, Wen Bingxi. Discriminative Training of GMM for Language Identification // Proc of the ICSA and IEEE Workshop on Spontaneous Speech Processing and Recognition. Tokyo, Japan, 2003: 108-110 [5] Campbell W M, Sturim D E, Reynolds D A. SVM Based Speaker Verification Using a GMM Supervector Kernel and NAP Variability // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006: 97-100 [6] Smith N, Gales M. Data-Dependent Kernels in SVM Classification of Speech Patterns // Proc of the 6th International Conference on Spoken Language Processing. Beijing, China, 2000, Ⅰ: 297-300 [7] Campbell M, Campbell J P, Reynolds D A, et al. Support Vector Machines for Speaker and Language Recognition. Computer Speech and Language, 2006, 20(2/3): 210-229 [8] Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006 [9] Chang C C, Lin C J. LIBSVM: A Library for Support Vector Machines [DB/OL]. [2008-10-20]. http: //www.csie.ntu.edu.tw/_cjlin/ libsvm [10] Hatch A O, Stolcke A. Generalized Linear Kernels for One Versus All Classification: Application to Speaker Recognition // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, V: 585-588 [11] Wang Xiaogang, Tang Xiaoou. Unified Subspace Analysis for Face Recognition // Proc of the International Conference Computer Vision. Nice, France, 2003: 67-68 [12] Rowies S. EM Algorithm for PCA and SPCA // Proc of the Conference on Neural Information Processing System. Breckenridge, USA, 1997: 626-632 [13] The 2007 NIST Language Recognition Evaluation Plan (LRE07) [EB/OL]. [2008-11-03]. http://ww.nist.gov/speech/tests/lang/2003/LRE07EvalPlan-v7e.pdf