Abstract:The conventional word embeddings are learned from the co-occurrence probabilities between the words within a same sentence. The learning algorithm is task-independent and unsupervised. A method for constructing word embeddings is proposed by utilizing the constraints of paraphrasing to improve the performance of paraphrase scoring with word embeddings and bag-of-words model in knowledge base (KB) based question answering (QA). In the proposed method, the pairs of paraphrase questions and non-paraphrase questions are collected respectively from a database of question paraphrases according to some designed rules. Then, the inequalities describing the similarities between the pairs of questions are adopted to represent the semantic constraint at the sentence level. These inequalities are integrated into the objective function for training word embeddings. Experimental results show that the proposed method improves the accuracies of paraphrase scoring and KB-based question answering compared with conventional word embedding methods.
詹晨迪,凌震华,戴礼荣. 面向知识库问答中复述问句评分的词向量构建方法*[J]. 模式识别与人工智能, 2016, 29(9): 825-831.
ZHAN Chendi, LING Zhenhua, DAI Lirong. Learning Word Embeddings for Paraphrase Scoring in Knowledge Base Based Question Answering. , 2016, 29(9): 825-831.
[1] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge // Proc of the ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2008: 1247-1250. [2] YAO X C, VAN DURME B. Information Extraction over Structured Data: Question Answering with Freebase // Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2014: 956-966. [3] BERANT J, CHOU A, FROSTIG R, et al. Semantic Parsing on Freebase from Question-Answer Pairs[EB/OL].[2016-02-01].http://www.cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pdf. [4] BERANT J, LIANG P. Semantic Parsing via Paraphrasing[EB/OL].[2016-02-01]. http://www.anthology.aclweb.org/P/P14/P14-1133.pdf. [5] BORDES A, WESTON J, CHOPRA S. Question Answering with Subgraph Embeddings[C/OL].[2016-02-01]. http://www.aclweb.org/old_anthology/D/D14/D14-1067.pdf. [6] BORDES A, WESTON J, USUNIER N. Open Question Answering with Weakly Supervised Embedding Models // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. New York, USA: Springer, 2014: 165-180. [7] DONG L, WEI F R, ZHOU M, et al. Question Answering over Freebase with Multi-column Convolutional Neural Networks // Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. New York, USA: Association for Computational Linguistics, 2015: 260-269. [8] HARRIS Z S. Distributional Structure // HIZ· H, eds.Papers on Syntax. Amsterdam, The Netherlands: Springer, 1981: 3-22. [9] MILLER G A, CHARLES W G. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 1991, 6(1): 1-28. [10] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL] .[2016-02-01]. http://arxiv.org/pdf/1301.3781.pdf. [11] PENNINGTON J, SOCHER R, MANNING C D. Glove: Global Vectors for Word Representation // Proc of the Conference on Empirical Methods in Natural Language Processing. New York, USA: Association for Computational Linguistics, 2014: 1532-1543. [12] YU M, DREDZE M. Improving Lexical Embeddings with Semantic Knowledge // Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2014: 545-550. [13] GANITKEVITCH J, VAN DURME B, CALLISON-BURCH C. PPDB: The Paraphrase Database // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics. New York, USA: Association for Computational Linguistics, 2013: 758-764. [14] FELLBAUM C. WordNet. New York, USA: John Wiley & Sons, 1999. [15] LIU Q, JIANG H, WEI S, et al. Learning Semantic Word Embe-ddings Based on Ordinal Knowledge Constraints // Proc of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. New York, USA: Association for Computational Linguistics, 2015: 1501-1511. [16] DUCHI J, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159. [17] FADER A, ZETTLEMOYER L, ETZIONI O. Open Question Answering over Curated and Extracted Knowledge Bases // Proc of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining. New York, USA: ACM, 2014: 1156-1165.