基于词向量特征的循环神经网络语言模型<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201504002

Abstract
Figure/Table
References
Related Citation (13)

Download: PDF (416 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract The recurrent neural network language model(RNNLM) solves the problems of data sparseness and dimensionality disaster in traditional N-gram models. However, the original RNNLM is still lack of long dependence due to the vanishing gradient problem. In this paper, an improved method based on contextual word vectors is proposed for RNNLM. To improve the structure of models, a feature layer is added into the input layer. Contextual word vectors are added into the model with feature layer to reinforce the ability of learning long-distance information during the training. Experimental results show that the proposed method effectively improves the performance of RNNLM.

Key words： Speech Recognition Language Model Recurrent Neural Network Word Vector

Received: 27 February 2014

ZTFLH:

TP391

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	ZHANG Jian
	QU Dan
	LI Zhen

Cite this article:

ZHANG Jian,QU Dan,LI Zhen. Recurrent Neural Network Language Model Based on Word Vector Features[J]. , 2015, 28(4): 299-305.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201504002 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2015/V28/I4/299

[1] Schwenk H. Continuous Space Language Models. Computer Speech and Language, 2007, 21(3): 492-518
[2] Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistic Language Model. Journal of Machine Learning Research, 2003, 3: 1137-1155
[3] Mikolov T, Karafiát M, Burget L, et al. Recurrent Neural Network Based Language Model // Proc of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1045-1048
[4] Mikolov T, Kombrink S, Burget L, et al. Extensions of Recurrent Neural Network Language Model // Proc of theInternational Conference on Acoustics, Speech and Signal Processing. Prague,
Czech Republic, 2011: 5528-5531
[5] Bengio Y, Simard P, Frasconi P. Learning Long-Term Dependencies with Gradient Descent Is Difficult.Trans on Neural Networks, 1994, 5(2): 157-166
[6] Son L H, Allauzen A, Yvon F. Measuring the Influence of Long Range Dependencies with Neural Network Language Models // Proc of the NAACL-HLT Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. Mon-treal, Canada, 2012: 1-10
[7] Martens J, Sutskever I. Learning Recurrent Neural Networks with Hessian-Free Optimization [EB/OL].[2014-02-10]. http://www.icml-2011.org/papers/532_icmlpaper.pdf
[8] Sundermeyer M, Schlüter R, Ney H. LSTM Neural Networks for Language Modeling[EB/OL].[2014-02-10]. http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf
[9] Shi Y, Wiggers P, Jonker C M. Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features // Proc of the 13th Annual Conference of the International Speech Communication Association. Portland, USA, 2012: 1664-1667
[10] Auli M, Galley M, Quirk C, et al. Joint Language and Translation Modeling with Recurrent Neural Networks // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Sea-ttle, USA, 2013: 1044-1054
[11] Yao K, Zweig G, Hwang M Y, et al. Recurrent Neural Networks for Language Understanding [EB/OL]. [2014-02-10]. http://research.microsoft.com/pubs/200236/RNN4LU.pdf
[12] Hinton G E. Learning Distributed Representations of Concepts // Proc of the 8th Annual Conference of the Cognitive Science Society. Amherst, USA, 1986: 1-12
[13] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. [2014-02-10]. http://arxiv.org/pdf/1301.3781.pdf
[14] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [EB/OL].[2014-02-10]. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
[15] Marcus M P, Marcinkiewicz M A, Santorini B. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics, 1993, 19(2): 313-330
[16] Mikolov T, Deoras A, Kombrink S, et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques [EB/OL]. [2014-02-14]. http://www.fit.vutbr.cz/~imikolov/~rnnlm/is 2011_emp.pdf
[17] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi Speech Re-cognition Toolkit [EB/OL].[2014-02-10]. http://homepages.inf.ed.ac.uk/aghoshal/pubs/asru11-kaldi.pdf