模式识别与人工智能
2025年4月11日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2015, Vol. 28 Issue (3): 209-213    DOI: 10.16451/j.cnki.issn1003-6059.201503003
论文与报告 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于深层神经网络的藏语识别*
袁胜龙,郭武,戴礼荣
中国科学技术大学 电子工程与信息科学系 语音及语言信息处理国家工程实验室 合肥230027
Speech Recognition Based on Deep Neural Networks on Tibetan Corpus
YUAN Sheng-Long, GUO Wu, DAI Li-Rong
National Engineering Laboratory for Speech and Language Information Processing,Department of Electronic Engineering and Information Science,University of Science and Technology of China, Hefei 230027

全文: PDF (347 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 文中首次涉及藏语的自然对话风格大词汇电话连续语音识别问题.作为一种少数民族语言,藏语识别面临的最大的困难是数据稀疏问题.文中在基于深层神经网络(DNN)的声学模型建模中,针对数据稀疏的问题,提出采用大语种数据训练好的DNN作为目标模型的初始网络进行模型优化的策略.另外,由于藏语语音学的研究很不完善,人工生成决策树问题集的方式并不可行.针对该问题,文中利用数据驱动的方式自动生成决策树问题集,对三音子隐马尔可夫模型(HMM)进行状态绑定,从而减少需要估计的模型参数.在测试集上,基于混合高斯模型(GMM)声学建模的藏字识别率为30.86%.在基于DNN的声学模型建模中,采用三种大语种数据训练好的DNN网络作为初始网络,并在测试集上验证该方法的有效性, 藏字识别正确率达到43.26%.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
袁胜龙
郭武
戴礼荣
关键词 藏语连续语音识别数据驱动深层神经网络(DNN)    
Abstract:Large vocabulary continuous speech recognition on telephonic conversational Tibetan is firstly addressed in this paper. As a minority language, the major difficulty in Tibetan speech recognition is data deficiency. In this paper, the acoustic model of Tibetan is trained based on deep neural networks (DNN).To address the issue of data deficiencies, the DNN models of other majority languages are used as the initial networks of the objective Tibetan DNN model. In addition, phonetic questions of Tibetan generated by phonetic expert are unavailable due to the lacking knowledge of phonetics. To reduce the number of tri-phone hidden Markov models(HMM) in Tibetan speech recognition, phonetic questions automatically generated in the data driven manner are used for tying the tri-phone HMM. In this paper, different clustering of tri-phone states is tested and the words accuracy is about 30.86% on the test corpus by Gaussian mixture model(GMM). When the acoustic model is trained based on DNN, 3 kinds of DNN model trained by different large corpus are adopted. The experimental results show that the proposed methods can improve the recognition performance, and the words accuracy is about 43.26% on the test corpus.
Key wordsTibetan    Continuous Speech Recognition    Data Driven    Deep Neural Networks(DNN)   
收稿日期: 2013-10-21     
ZTFLH: TP18  
基金资助:国家自然科学基金项目(No.61273264)资助
作者简介: 袁胜龙(通讯作者),男,1989年生,硕士研究生,主要研究方向为语音识别.E-mail:slyuan@mail.ustc.edu.cn.郭武,男,1973年生,博士,副教授,主要研究方向为说话人识别与确认、语音识别.戴礼荣,男,1962年生,博士,教授,主要研究方向为语音识别、语音信号处理.
引用本文:   
袁胜龙,郭武,戴礼荣. 基于深层神经网络的藏语识别*[J]. 模式识别与人工智能, 2015, 28(3): 209-213. YUAN Sheng-Long, GUO Wu, DAI Li-Rong. Speech Recognition Based on Deep Neural Networks on Tibetan Corpus. , 2015, 28(3): 209-213.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201503003      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2015/V28/I3/209
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn