基于深度神经网络的语种识别<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201512005

摘要
图/表
参考文献
相关文章 (2)

全文: PDF (552 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要语音段的有效表示方法存在易混淆语种和短时语音段识别率较低等问题，为满足不同时长和方言的识别要求，提出基于深度神经网络不同层的有效语音段表示方法.采用含有中间瓶颈层的深层神经网络作为前端特征提取，综合利用该网络的输出层和中间瓶颈层输出结果，得到不同形式的语音段表示并用于语种识别.在美国国家标准技术局语种识别评测2009年和2011年阿拉伯方言数据集上验证了方法的有效性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	崔瑞莲
	宋彦
	蒋兵
	戴礼荣

关键词 ：语种识别, 深度神经网络, 语音段表示, 深度瓶颈特征

Abstract：Aiming at the problems of confusable dialects and short-duration utterance in automatic spoken language identification (LID), an improved utterance representation method is proposed based on different layers of deep neural network (DNN). Deep bottleneck network (DBN), a DNN with an internal bottleneck layer, is employed as a front-end feature extractor. Different representations based on output layer and middle bottleneck layer of DBN for LID are obtained and fused. Evaluations on the NIST LRE2009 dataset and NIST LRE2011 Arabic dialect dataset demonstrate that the proposed method based on DBN achieves good performance.

Key words： Language Identification Deep Neural Network Utterance Representation Deep Bottleneck Feature

收稿日期: 2014-11-17

ZTFLH:

TN 912.34

基金资助:国家自然科学基金项目(No.61172158)资助

作者简介: 崔瑞莲(通讯作者)，女，1990年生，硕士研究生，主要研究方向为语音信号处理、语种识别.E-mail:cuirl@mail.ustc.edu.cn.宋彦，男，1972年生，博士，讲师，主要研究方向为多媒体信息处理.蒋兵，男，1987年生，博士研究生，主要研究方向为多媒体信息处理.戴礼荣，男，1962年生，博士，教授，主要研究方向为数字信号处理、模式识别.

引用本文:

崔瑞莲，宋彦，蒋兵，戴礼荣. 基于深度神经网络的语种识别^*[J]. 模式识别与人工智能, 2015, 28(12): 1093-1099. CUI Rui-Lian, SONG Yan, JIANG Bing, DAI Li-Rong. Language Identification Based on Deep Neural Network. , 2015, 28(12): 1093-1099.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201512005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2015/V28/I12/1093

[1] Zissman M A. Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans on Speech and Audio Processing, 1996, 4(1): 31-44
[2] Matejka P, Schwarz P, Cernocky′ J, et al. Phonotactic Language Identification Using High Quality Phoneme Recognition // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 2237-2240
[3] Torres-Carrasquillo P A, Singer E, Kohler M A, et al. Approaches to Language Identification Using Gaussian Mixture Models and Shi-fted Delta Cepstral Features // Proc of the 7th International Confe-rence on Spoken Language Processing. Denver, USA, 2002: 89-92
[4] Burget L, Matejka P, Cernocky J. Discriminative Training Techniques for Acoustic Language Identification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, I: 209-212
[5] Qu D, Wang B X. Discriminative Training of GMM for Language Identification[EB/OL]. [2014-11-01]. http://isca-speech.org/archive_open/archive_papers/sspr2003/sspr_map8.pdf
[6] Vair C, Colibro D, Castaldo F, et al. Channel Factors Compensation in Model and Feature Domain for Speaker Recognition // Proc of the IEEE Speaker and Language Recognition Workshop. San Juan, USA, 2006. DOI: 10.1109/ODYSSEY.2006.248117
[7] Hubeika V, Burget L, Matejka P, et al. Discriminative Training and Channel Compensation for Acoustic Language Recognition // Proc of the 9th Annual Conference of the International Speech Communication Association. Brisbane, Australia, 2008: 301-304
[8] Dehak N, Kenny P, Dehak R, et al. Front-End Factor Analysis for Speaker Verification. IEEE Trans on Audio, Speech and Language Processing, 2011, 19(4): 788-798
[9] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language Recognition via Ivectors and Dimensionality Reduction // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 857-860
[10] Song Y, Jiang B, Bao Y B, et al. I-vector Representation Based on Bottleneck Features for Language Identification. Electronics Le-tters, 2013, 49(24): 1569-1570
[11] Hinton G, Deng L, Yu D, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97
[12] Campbell W M, Richardson F S, Reynolds D A. Language Recognition with Word Lattices and Support Vector Machines // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, IV: 989-992
[13] Jiang B, Song Y, Wei S, et al. Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 143-147
[14] Lei Y, Ferrer L, Lawson A, et al. Application of Convolutional Neural Networks to Language Identification in Noisy Conditions[EB/OL]. [2014-11-01]. http://www.sri.com/sites/default/files/publications/paper_odyssey14_y.lei_final.pdf
[15] Ferrer L, Lei Y, McLaren M, et al. Spoken Language Recognition Based on Senone Posteriors // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 2150-2154
[16] Dehak N. Discriminative and Generative Approaches for Long-and Short-Term Speaker Characteristics Modeling: Application to Speaker
Verification. Ph. D Dissertation. Montreal, Canada: Ecole de Technologie Superieure, 2009
[17] Liu W W, Zhang W Q, Liu J. Discriminative Boosting Regression Backend for Phonotactic Language Recognition // Proc of the 9th International Symposium on Chinese Spoken Language Processing. Singapore, Singapore, 2014: 148-152
[18] The 2011 NIST Language Recognition Evaluation Plan(LRE11) [EB/OL]. [2014-11-01]. http://www.nist.gov/itl/iad/mig/upload/LER11_EvalPlan_releasev1.pdf
[19] Singer E, Torres-Carrasquillo P, Reynolds D, et al. The MITLL NIST LRE 2011 Language Recognition System // Proc of the Speaker and Language Recognition Workshop. Singapore, Singapore, 2012: 209-215