Abstract:Aiming at the problems of confusable dialects and short-duration utterance in automatic spoken language identification (LID), an improved utterance representation method is proposed based on different layers of deep neural network (DNN). Deep bottleneck network (DBN), a DNN with an internal bottleneck layer, is employed as a front-end feature extractor. Different representations based on output layer and middle bottleneck layer of DBN for LID are obtained and fused. Evaluations on the NIST LRE2009 dataset and NIST LRE2011 Arabic dialect dataset demonstrate that the proposed method based on DBN achieves good performance.
崔瑞莲,宋彦,蒋兵,戴礼荣. 基于深度神经网络的语种识别*[J]. 模式识别与人工智能, 2015, 28(12): 1093-1099.
CUI Rui-Lian, SONG Yan, JIANG Bing, DAI Li-Rong. Language Identification Based on Deep Neural Network. , 2015, 28(12): 1093-1099.
[1] Zissman M A. Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans on Speech and Audio Processing, 1996, 4(1): 31-44 [2] Matejka P, Schwarz P, Cernocky′ J, et al. Phonotactic Language Identification Using High Quality Phoneme Recognition // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 2237-2240 [3] Torres-Carrasquillo P A, Singer E, Kohler M A, et al. Approaches to Language Identification Using Gaussian Mixture Models and Shi-fted Delta Cepstral Features // Proc of the 7th International Confe-rence on Spoken Language Processing. Denver, USA, 2002: 89-92 [4] Burget L, Matejka P, Cernocky J. Discriminative Training Techniques for Acoustic Language Identification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, I: 209-212 [5] Qu D, Wang B X. Discriminative Training of GMM for Language Identification[EB/OL]. [2014-11-01]. http://isca-speech.org/archive_open/archive_papers/sspr2003/sspr_map8.pdf [6] Vair C, Colibro D, Castaldo F, et al. Channel Factors Compensation in Model and Feature Domain for Speaker Recognition // Proc of the IEEE Speaker and Language Recognition Workshop. San Juan, USA, 2006. DOI: 10.1109/ODYSSEY.2006.248117 [7] Hubeika V, Burget L, Matejka P, et al. Discriminative Training and Channel Compensation for Acoustic Language Recognition // Proc of the 9th Annual Conference of the International Speech Communication Association. Brisbane, Australia, 2008: 301-304 [8] Dehak N, Kenny P, Dehak R, et al. Front-End Factor Analysis for Speaker Verification. IEEE Trans on Audio, Speech and Language Processing, 2011, 19(4): 788-798 [9] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language Recognition via Ivectors and Dimensionality Reduction // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 857-860 [10] Song Y, Jiang B, Bao Y B, et al. I-vector Representation Based on Bottleneck Features for Language Identification. Electronics Le-tters, 2013, 49(24): 1569-1570 [11] Hinton G, Deng L, Yu D, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97 [12] Campbell W M, Richardson F S, Reynolds D A. Language Recognition with Word Lattices and Support Vector Machines // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, IV: 989-992 [13] Jiang B, Song Y, Wei S, et al. Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 143-147 [14] Lei Y, Ferrer L, Lawson A, et al. Application of Convolutional Neural Networks to Language Identification in Noisy Conditions[EB/OL]. [2014-11-01]. http://www.sri.com/sites/default/files/publications/paper_odyssey14_y.lei_final.pdf [15] Ferrer L, Lei Y, McLaren M, et al. Spoken Language Recognition Based on Senone Posteriors // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 2150-2154 [16] Dehak N. Discriminative and Generative Approaches for Long-and Short-Term Speaker Characteristics Modeling: Application to Speaker Verification. Ph. D Dissertation. Montreal, Canada: Ecole de Technologie Superieure, 2009 [17] Liu W W, Zhang W Q, Liu J. Discriminative Boosting Regression Backend for Phonotactic Language Recognition // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 148-152 [18] The 2011 NIST Language Recognition Evaluation Plan(LRE11) [EB/OL]. [2014-11-01]. http://www.nist.gov/itl/iad/mig/upload/LER11_EvalPlan_releasev1.pdf [19] Singer E, Torres-Carrasquillo P, Reynolds D, et al. The MITLL NIST LRE 2011 Language Recognition System // Proc of the Speaker and Language Recognition Workshop. Singapore, Singapore, 2012: 209-215