基于解码多候选结果的半监督数据挑选的语音识别

doi:10.16451/j.cnki.issn1003-6059.201807009

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (0 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要基于资源稀少情况下的语音识别,提出针对大量无标注数据的半监督学习的挑选策略,应用到声学模型和语言模型建模.采用少量数据训练种子模型后,解码无标注数据.首先在解码的最佳候选结果中采用置信度与困惑度结合的方法挑选高可信的语句训练声学模型及语言模型.进一步对解码得到的格进行转化,得到多候选文本,用于语言模型训练.在日语识别任务上,相比基于置信度挑选数据的方法,文中方法在识别率上具有较大提升.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王兮楼
	郭武
	解传栋

关键词 ：置信度, 半监督学习, 多候选, 低资源

Abstract：For speech recognition of low resources, a selection strategy for semi-supervised learning with a large number of unlabeled data is proposed, and this strategy is applied to both acoustic modeling and language modeling. After a small amount of data is used to train the seed model, the unlabeled data is decoded using the seed model. Firstly, high-confidence sentences are selected by using a combination of confidence measure and perplexity in the decoded best candidate results. Then, the high-confidence sentences are used to train acoustic model and language model. Furthermore, the decoded lattice is transformed to obtain multiple candidate texts for language model training. In the Japanese recognition task, the proposed method obtains a better recognition rate than the method of selecting data based on confidence measure.

Key words： Confidence Measure Semi-supervised Learning N-BEST Low Resource

收稿日期: 2017-11-08

ZTFLH:

TN 912.3

通讯作者: 王兮楼(通讯作者),硕士研究生,主要研究方向为语音识别.E-mail:zixian@mail.ustc.edu.cn.

作者简介: 郭武,博士,副教授,主要研究方向为说话人识别与确认、语音识别.E-mail:guowu@ustc.edu.cn.解传栋,硕士研究生,主要研究方向为语音识别、关键词检索.E-mail:xcdahu@mail.ustc.edu.cn.

引用本文:

王兮楼, 郭武, 解传栋. 基于解码多候选结果的半监督数据挑选的语音识别[J]. 模式识别与人工智能, 2018, 31(7): 662-667. WANG Xilou, GUO Wu, XIE Chuandong. Speech Recognition Based on Semi-supervised Data Selection via Decoding Multiple Candidate Results. , 2018, 31(7): 662-667.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201807009 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2018/V31/I7/662

[1] GRAVES A. Long Short-Term Memory // GRAVES A, ed. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Germany: Springer, 2012: 37-45.
[2] ZHU X J. Semi-Supervised Learning Literature Survey[C/OL]. [2017-10-15]. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf.
[3] LAMEL L, GAUVAIN J L, ADDA G. Unsupervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2002: 877-880.
[4] LAMEL L, GAUVAIN J L, ADDA G. Investigating Lightly Supervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2001: 477-480.
[5] WESSEL F, NEY H. Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing, 2005, 13(1): 23-31.
[6] YU K, GALES M, WANG L, et al. Unsupervised Training and Directed Manual Transcription for LVCSR. Speech Communication, 2010, 52(7/8): 652-663.
[7] ZHANG R, RUDNICKY A I. A New Data Selection Approach for Semi-supervised Acoustic Modeling // Proc of the IEEE Internatio-nal Conference on Acoustics Speech and Signal Processing Procee-dings. Washington, USA: IEEE, 2006, I: 421-424.
[8] 解传栋,郭武.基于困惑度数据挑选的半监督声学建模.模式识别与人工智能, 2016, 29(6): 542-547.
(XIE C D, GUO W. Semi-supervised Acoustic Modeling Based on
Perplexity Data Selection. Pattern Recognition and Artificial Intelligence, 2016, 29(6): 542-547.)
[9] LJOLJE A, PEREIRA F, RILEY M. Efficient General Lattice Generation and Rescoring[C/OL]. [2017-10-15]. https://www.isca-speech.org/archive/archive_papers/eurospeech_1999/e99_1251.pdf.
[10] STOLCKE A. SRILM-An Extensible Language Modeling Toolkit[J/OL]. [2017-10-15].http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdf.
[11] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit[T/OL]. [2017-10-15]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.3637&rep=rep1&type=pdf.
[12] LIN H, DENG L, YU D, et al. A Study on Multilingual Acoustic Modeling for Large Vocabulary ASR // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2009: 4333-4336.