|
|
Speech Recognition Based on Semi-supervised Data Selection via Decoding Multiple Candidate Results |
WANG Xilou1, GUO Wu1, XIE Chuandong1 |
1.National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027 |
|
|
Abstract For speech recognition of low resources, a selection strategy for semi-supervised learning with a large number of unlabeled data is proposed, and this strategy is applied to both acoustic modeling and language modeling. After a small amount of data is used to train the seed model, the unlabeled data is decoded using the seed model. Firstly, high-confidence sentences are selected by using a combination of confidence measure and perplexity in the decoded best candidate results. Then, the high-confidence sentences are used to train acoustic model and language model. Furthermore, the decoded lattice is transformed to obtain multiple candidate texts for language model training. In the Japanese recognition task, the proposed method obtains a better recognition rate than the method of selecting data based on confidence measure.
|
Received: 08 November 2017
|
|
Corresponding Authors:
WANG Xilou(Corresponding author), master student. His research interests include speech recognition.
|
About author:: GUO Wu, Ph.D., associate professor. His research interests include speaker recognition and verification, speech recognition.XIE Chuandong, master student. His research interests include speech recognition and keyword search. |
|
|
|
[1] GRAVES A. Long Short-Term Memory // GRAVES A, ed. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Germany: Springer, 2012: 37-45. [2] ZHU X J. Semi-Supervised Learning Literature Survey[C/OL]. [2017-10-15]. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf. [3] LAMEL L, GAUVAIN J L, ADDA G. Unsupervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2002: 877-880. [4] LAMEL L, GAUVAIN J L, ADDA G. Investigating Lightly Supervised Acoustic Model Training // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington, USA: IEEE, 2001: 477-480. [5] WESSEL F, NEY H. Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing, 2005, 13(1): 23-31. [6] YU K, GALES M, WANG L, et al. Unsupervised Training and Directed Manual Transcription for LVCSR. Speech Communication, 2010, 52(7/8): 652-663. [7] ZHANG R, RUDNICKY A I. A New Data Selection Approach for Semi-supervised Acoustic Modeling // Proc of the IEEE Internatio-nal Conference on Acoustics Speech and Signal Processing Procee-dings. Washington, USA: IEEE, 2006, I: 421-424. [8] 解传栋,郭 武.基于困惑度数据挑选的半监督声学建模.模式识别与人工智能, 2016, 29(6): 542-547. (XIE C D, GUO W. Semi-supervised Acoustic Modeling Based on Perplexity Data Selection. Pattern Recognition and Artificial Intelligence, 2016, 29(6): 542-547.) [9] LJOLJE A, PEREIRA F, RILEY M. Efficient General Lattice Generation and Rescoring[C/OL]. [2017-10-15]. https://www.isca-speech.org/archive/archive_papers/eurospeech_1999/e99_1251.pdf. [10] STOLCKE A. SRILM-An Extensible Language Modeling Toolkit[J/OL]. [2017-10-15].http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdf. [11] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit[T/OL]. [2017-10-15]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.468.3637&rep=rep1&type=pdf. [12] LIN H, DENG L, YU D, et al. A Study on Multilingual Acoustic Modeling for Large Vocabulary ASR // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2009: 4333-4336. |
|
|
|