Abstract:The recurrent neural network language model(RNNLM) solves the problems of data sparseness and dimensionality disaster in traditional N-gram models. However, the original RNNLM is still lack of long dependence due to the vanishing gradient problem. In this paper, an improved method based on contextual word vectors is proposed for RNNLM. To improve the structure of models, a feature layer is added into the input layer. Contextual word vectors are added into the model with feature layer to reinforce the ability of learning long-distance information during the training. Experimental results show that the proposed method effectively improves the performance of RNNLM.
张剑,屈丹,李真. 基于词向量特征的循环神经网络语言模型*[J]. 模式识别与人工智能, 2015, 28(4): 299-305.
ZHANG Jian, QU Dan, LI Zhen. Recurrent Neural Network Language Model Based on Word Vector Features. , 2015, 28(4): 299-305.
[1] Schwenk H. Continuous Space Language Models. Computer Speech and Language, 2007, 21(3): 492-518 [2] Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 2003, 3: 1137-1155 [3] Mikolov T, Karafiát M, Burget L, et al. Recurrent Neural Network Based Language Model // Proc of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1045-1048 [4] Mikolov T, Kombrink S, Burget L, et al. Extensions of Recurrent Neural Network Language Model // Proc of theInternational Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic, 2011: 5528-5531 [5] Bengio Y, Simard P, Frasconi P. Learning Long-Term Dependencies with Gradient Descent Is Difficult.Trans on Neural Networks, 1994, 5(2): 157-166 [6] Son L H, Allauzen A, Yvon F. Measuring the Influence of Long Range Dependencies with Neural Network Language Models // Proc of the NAACL-HLT Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. Mon-treal, Canada, 2012: 1-10 [7] Martens J, Sutskever I. Learning Recurrent Neural Networks with Hessian-Free Optimization [EB/OL].[2014-02-10]. http://www.icml-2011.org/papers/532_icmlpaper.pdf [8] Sundermeyer M, Schlüter R, Ney H. LSTM Neural Networks for Language Modeling[EB/OL].[2014-02-10]. http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf [9] Shi Y, Wiggers P, Jonker C M. Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features // Proc of the 13th Annual Conference of the International Speech Communication Association. Portland, USA, 2012: 1664-1667 [10] Auli M, Galley M, Quirk C, et al. Joint Language and Translation Modeling with Recurrent Neural Networks // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Sea-ttle, USA, 2013: 1044-1054 [11] Yao K, Zweig G, Hwang M Y, et al. Recurrent Neural Networks for Language Understanding [EB/OL]. [2014-02-10]. http://research.microsoft.com/pubs/200236/RNN4LU.pdf [12] Hinton G E. Learning Distributed Representations of Concepts // Proc of the 8th Annual Conference of the Cognitive Science Society. Amherst, USA, 1986: 1-12 [13] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. [2014-02-10]. http://arxiv.org/pdf/1301.3781.pdf [14] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [EB/OL].[2014-02-10]. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf [15] Marcus M P, Marcinkiewicz M A, Santorini B. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics, 1993, 19(2): 313-330 [16] Mikolov T, Deoras A, Kombrink S, et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques [EB/OL]. [2014-02-14]. http://www.fit.vutbr.cz/~imikolov/~rnnlm/is 2011_emp.pdf [17] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi Speech Re-cognition Toolkit [EB/OL].[2014-02-10]. http://homepages.inf.ed.ac.uk/aghoshal/pubs/asru11-kaldi.pdf