Abstract:In the phoneme recognition based language identification system, the key issue is whether the tokens or the token sequence can reflect the language related information or not. However, it is observed that for certain utterance, the noise in the output token sequence from the phone recognizer is introduced due to the channel, speaker and background clutters. To address this problem, each utterance is represented in n-gram vector. And in this vector space, the factor analysis is applied to model the noise subspace, which will be reduced in final modeling process. The experiment results on NIST LRE 2007 show that the proposed method can outperform the existing phone recognition based language identification system. In 30s evaluation task, the equal error rate (EER) of recognition reduces relatively about 14.4% against the baseline phone recognition followed by language modeling (PRLM) system, while about 12.9% against the baseline phone recognition followed by support vector machine (PRSVM) system.
仲海兵,宋彦,戴礼荣. 基于音素识别的语种辨识方法中的因子分析[J]. 模式识别与人工智能, 2012, 25(1): 105-110.
ZHONG Hai-Bing, SONG Yan, DAI Li-Rong. Factor Analysis for Language Identification Based on Phoneme Recognition. , 2012, 25(1): 105-110.
[1] Matejka P,Schwarz P,Cernocky J,et al.Phonotactic Language Identification Using High Quality Phoneme Recognition // Proc of the 9th European Conference on Speech Communication and Technology.Lisbon,Portugal,2005: 2237-2241 [2] Povey D.Discriminative Training for Large Vocabulary Speech Recognition.Ph.D Dissertation.Cambridge,UK: Cambridge University,2004 [3] Gauvain J L,Messaoudi A,Schewenk H.Language Recognition Using Phone Lattices // Proc of the 8th International Conference on Spoken Language Processing.Jeju Island,Korea,2004: 1283-1286 [4] Shen Wade,Reynolds D.Improving Phonotactic Language Recognition with Acoustic Adaption // Proc of the 8th Annual Conference of the International Speech Communication Association.Antwerp,Belgium,2007: 358-361 [5] Gales M J F.Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition.Computer Speech and Language,1998,12(2): 75-98 [6] Wegmann S,McAllester D,Orloff J,et al.Speaker Normalization on Conversational Telephone Speech // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Atlanta,USA,1996: 339-341 [7] Matéjka P,Schwarz P,Hermansky H,et al.Phoneme Recognition Using Temporal Patterns // Proc of the 6th International Conference on Text,Speech and Dialogue.Ceske Budejovice,Czech Republic,2003: 198-205 [8] Campbell W M,Campbell J R,Reynolds D A,et al.High-Level Speaker Verification with Support Vector Machines // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Montreal,Canada,2004: 73-76 [9] Zissman M A.Comparison of Four Approaches to Automatic Language Identification of Telephone Speech.IEEE Trans on Speech and Audio Processing,1996,4(1): 31-44 [10] Campbell W M,Campbell J P.Support Vector Machines for Speaker and Language Recognition.Computer Speech and Language,2006,20(2/3): 210-229 [11] Solomonoff A,Campbell W,Quillen C.Channel Compensation for SVM Speaker Recognition // Proc of the Speaker and Language Recognition Workshop.Toledo,Spain,2004: 57-62 [12] Rubin D B,Thayer D T.EM Algorithms for ML Factor Analysis.Psychometrika,1982,47(1): 69-76 [13] Fu Qiang,Song Yan,Dai Lirong.Factor Analysis in GMM-Based Language Identification.Journal of Chinese Information Processing,2009,23(4): 77-81 (in Chinese) (付 强,宋 彦,戴礼荣.因子分析在基于GMM的自动语种识别中的应用.中文信息学报,2009,23(4): 77-81) [14] Xu Bing,Song Yan,Dai Lirong.The Adaptation Schemes in PR-SVM Based Language Recognition // Proc of the 6th International Symposium on Chinese Spoken Language Processing.Kunming,China,2008: 334-337