|
|
Learning Word Embeddings for Paraphrase Scoring in Knowledge Base Based Question Answering |
ZHAN Chendi, LING Zhenhua, DAI Lirong |
National Engineering Laboratory for Speech and Language Information Processing, .University of Science and Technology of China, Hefei 230027 |
|
|
Abstract The conventional word embeddings are learned from the co-occurrence probabilities between the words within a same sentence. The learning algorithm is task-independent and unsupervised. A method for constructing word embeddings is proposed by utilizing the constraints of paraphrasing to improve the performance of paraphrase scoring with word embeddings and bag-of-words model in knowledge base (KB) based question answering (QA). In the proposed method, the pairs of paraphrase questions and non-paraphrase questions are collected respectively from a database of question paraphrases according to some designed rules. Then, the inequalities describing the similarities between the pairs of questions are adopted to represent the semantic constraint at the sentence level. These inequalities are integrated into the objective function for training word embeddings. Experimental results show that the proposed method improves the accuracies of paraphrase scoring and KB-based question answering compared with conventional word embedding methods.
|
Received: 29 March 2016
|
|
About author:: ZHAN Chendi, born in 1992, master student. His research interests include natural language processing.LING Zhenhua(Corresponding author), born in 1979, Ph.D., associate professor. His research interests include speech synthesis and natural language processing.DAI Lirong, born in 1962, Ph.D., professor. His research interests include digital signal processing and man-machine speech communication.) |
|
|
|
[1] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge // Proc of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2008: 1247-1250. [2] YAO X C, VAN DURME B. Information Extraction over Structured Data: Question Answering with Freebase // Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2014: 956-966. [3] BERANT J, CHOU A, FROSTIG R, et al. Semantic Parsing on Freebase from Question-Answer Pairs[EB/OL].[2016-02-01].http://www.cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf. [4] BERANT J, LIANG P. Semantic Parsing via Paraphrasing[EB/OL].[2016-02-01]. http://www.anthology.aclweb.org/P/P14/P14-1133.pdf. [5] BORDES A, WESTON J, CHOPRA S. Question Answering with Subgraph Embeddings[C/OL].[2016-02-01]. http://www.aclweb.org/old_anthology/D/D14/D14-1067.pdf. [6] BORDES A, WESTON J, USUNIER N. Open Question Answering with Weakly Supervised Embedding Models // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. New York, USA: Springer, 2014: 165-180. [7] DONG L, WEI F R, ZHOU M, et al. Question Answering over Freebase with Multi-column Convolutional Neural Networks // Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. New York, USA: Association for Computational Linguistics, 2015: 260-269. [8] HARRIS Z S. Distributional Structure // HIZ· H, eds.Papers on Syntax. Amsterdam, The Netherlands: Springer, 1981: 3-22. [9] MILLER G A, CHARLES W G. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 1991, 6(1): 1-28. [10] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL] .[2016-02-01]. http://arxiv.org/pdf/1301.3781.pdf. [11] PENNINGTON J, SOCHER R, MANNING C D. Glove: Global Vectors for Word Representation // Proc of the Conference on Empirical Methods in Natural Language Processing. New York, USA: Association for Computational Linguistics, 2014: 1532-1543. [12] YU M, DREDZE M. Improving Lexical Embeddings with Semantic Knowledge // Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2014: 545-550. [13] GANITKEVITCH J, VAN DURME B, CALLISON-BURCH C. PPDB: The Paraphrase Database // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2013: 758-764. [14] FELLBAUM C. WordNet. New York, USA: John Wiley & Sons, 1999. [15] LIU Q, JIANG H, WEI S, et al. Learning Semantic Word Embe-ddings Based on Ordinal Knowledge Constraints // Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. New York, USA: Association for Computational Linguistics, 2015: 1501-1511. [16] DUCHI J, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159. [17] FADER A, ZETTLEMOYER L, ETZIONI O. Open Question Answering over Curated and Extracted Knowledge Bases // Proc of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining. New York, USA: ACM, 2014: 1156-1165. |
|
|
|