模式识别与人工智能
2025年4月11日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2017, Vol. 30 Issue (4): 359-364    DOI: 10.16451/j.cnki.issn1003-6059.201704008
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于端到端技术的藏语语音识别*
王庆楠 , 郭武, 解传栋
中国科学技术大学 语音及语言信息处理国家工程实验室 合肥 230027
Towards End to End Speech Recognition System for Tibetan
WANG Qingnan, GUO Wu, XIE Chuandong
National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027

全文: PDF (555 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 现阶段基于链接时序分类技术的端到端的大规模连续语音识别成为研究热点,文中将其应用于藏语识别中,取得优于主流的双向长短时记忆网络性能.在基于端到端的语音识别中,不需要发音字典等语言学知识,识别性能无法得到保证.文中提出将已有的语言学知识结合至端到端的声学建模中,采用绑定的三音子作为建模单元,解决建模单元的稀疏性问题,大幅提高声学建模的区分度和鲁棒性.在藏语测试集上,通过实验证明文中方法提高基于链接时序分类技术的声学模型的识别率,并验证语言学知识和基于端到端声学建模技术结合的有效性.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
王庆楠
郭武
解传栋
关键词 端到端 藏语 自动语音识别 链接时序分类    
Abstract:End to end speech recognition based on connectionist temporal classification (CTC) is applied to the Tibetan automatic speech recognition(ASR), and the performance is better than that of the state-of-the-art bidirectional long short-term memory approach. In end to end speech recognition,the linguistic knowledge such as pronunciation lexicon is not essential, and therefore the performance of the ASR systems based on CTC is weaker than that of the baseline. Aiming at this problem, a strategy combining the existing linguistic knowledge and the acoustic modeling based on CTC is proposed, and the tri-phone is taken as the basic units in acoustic modeling. Thus, the sparse problem of the modeling unit is effectively solved, and the discrimination and robustness of the CTC model are improved substantially.Results on the test set of Tibetan corpus show that the word accuracy of the model based on CTC is improved substantially and the effectiveness of the combination of the linguistic information and the CTC modeling is verified.
Key wordsEnd to End    Tibetan    Automatic Speech Recognition    Connectionist Temporal Classification   
收稿日期: 2016-09-30     
基金资助:国家重点研发计划项目(No.2016YFB1001300)资助
作者简介: 王庆楠(通讯作者),男,1992年生,硕士研究生,主要研究方向为语音识别.E-mail:wqn628@mail.ustc.edu.cn.
郭 武,男,1973 年生,博士,副教授,主要研究方向为语音识别、说话人识别.E-mail:guowu@ustc.edu.cn.
解传栋,男,1990年生,硕士研究生,主要研究方向为语音识别、关键词检索.E-mail:xcdahu@mail.ustc.edu.cn.
引用本文:   
王庆楠 , 郭武, 解传栋. 基于端到端技术的藏语语音识别*[J]. 模式识别与人工智能, 2017, 30(4): 359-364. WANG Qingnan, GUO Wu, XIE Chuandong. Towards End to End Speech Recognition System for Tibetan. , 2017, 30(4): 359-364.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201704008      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2017/V30/I4/359
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn