Abstract:For speech recognition of low resources, a selection strategy for semi-supervised learning with a large number of unlabeled data is proposed, and this strategy is applied to both acoustic modeling and language modeling. After a small amount of data is used to train the seed model, the unlabeled data is decoded using the seed model. Firstly, high-confidence sentences are selected by using a combination of confidence measure and perplexity in the decoded best candidate results. Then, the high-confidence sentences are used to train acoustic model and language model. Furthermore, the decoded lattice is transformed to obtain multiple candidate texts for language model training. In the Japanese recognition task, the proposed method obtains a better recognition rate than the method of selecting data based on confidence measure.
[1] GRAVES A. Long Short-Term Memory // GRAVES A, ed. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Germany: Springer, 2012: 37-45. [2] ZHU X J. Semi-Supervised Learning Literature Survey[C/OL]. [2017-10-15]. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf. [3] LAMEL L, GAUVAIN J L, ADDA G. Unsupervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2002: 877-880. [4] LAMEL L, GAUVAIN J L, ADDA G. Investigating Lightly Supervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2001: 477-480. [5] WESSEL F, NEY H. Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing, 2005, 13(1): 23-31. [6] YU K, GALES M, WANG L, et al. Unsupervised Training and Directed Manual Transcription for LVCSR. Speech Communication, 2010, 52(7/8): 652-663. [7] ZHANG R, RUDNICKY A I. A New Data Selection Approach for Semi-supervised Acoustic Modeling // Proc of the IEEE Internatio-nal Conference on Acoustics Speech and Signal Processing Procee-dings. Washington, USA: IEEE, 2006, I: 421-424. [8] 解传栋,郭 武.基于困惑度数据挑选的半监督声学建模.模式识别与人工智能, 2016, 29(6): 542-547. (XIE C D, GUO W. Semi-supervised Acoustic Modeling Based on Perplexity Data Selection. Pattern Recognition and Artificial Intelligence, 2016, 29(6): 542-547.) [9] LJOLJE A, PEREIRA F, RILEY M. Efficient General Lattice Generation and Rescoring[C/OL]. [2017-10-15]. https://www.isca-speech.org/archive/archive_papers/eurospeech_1999/e99_1251.pdf. [10] STOLCKE A. SRILM-An Extensible Language Modeling Toolkit[J/OL]. [2017-10-15].http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdf. [11] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit[T/OL]. [2017-10-15]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.3637&rep=rep1&type=pdf. [12] LIN H, DENG L, YU D, et al. A Study on Multilingual Acoustic Modeling for Large Vocabulary ASR // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2009: 4333-4336.