|
|
Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models |
MA Xiaojun1, GUO Jianyi1,2, WANG Hongbin1,2, ZHANG Zhikun1,2, XIAN Yantuan1,2, YU Zhengtao1,2 |
1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500 2.Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500 |
|
|
Abstract When the Skip-gram word vector model deals with the polysemous words, only one word vector with mixed multiple semantics can be computed and different meanings of polysemous words can not be distinguished. In this paper, an entity disambiguation method combining the word vector and the topic model in specific domains is proposed. The word vector method is used to obtain the vector form of the reference term and the candidate entity from the background text and the knowledge base, respectively. The similarities of the context and the category reference are calculated, and the LDA topic model and the Skip-gram word vector models are used to obtain the word vector representation of different meanings of the polysemous words. Meanwhile, the domain keywords are extracted and then the domain topic keyword similarity are calculated. Finally, three types of features are combined, and the candidate entity with the highest similarity is selected as the final target entity. Experiments show that the proposed method has better disambiguation results than the existing disambiguation methods.
|
Received: 15 September 2017
|
|
Fund:Supported by National Natural Science Foundation of China(No.61562052,61462054,61363044) |
About author:: (MA Xiaojun, born in 1991, master stu-dent. His research interests include natural language processing and knowledge representation.) (GUO Jianyi(Corresponding author), born in 1964, master, professor. Her research interests include pattern recognition, natural language processing, information extraction and knowledge acquisition.) (WANG Hongbin, born in 1983, Ph.D., lecturer. His research interests include intelligent information system, natural language processing and information retrieval.) (ZHANG Zhikun, born in 1977, master, lecturer. His research interests include machine translation, information retrieval and information extraction.) (XIAN Yantuan, born in 1981, Ph.D. candidate, lecturer. His research interests include machine translation, information retrie-val and information extraction.) (YU Zhengtao, born in 1970, Ph.D., professor. His research interests include machine translation, natural language processing and information retrieval.) |
|
|
|
[1] BAGGA A, BALDWIN B. Entity-Based Cross-Document Coreferen-cing Using the Vector Space Model // Proc of the 17th International Conference on Computational Linguistics. Stroudsburg, USA: ACL, 1998, I: 79-85. [2] HONNIBAL M, DALE R. DAMSEL: The DSTO/Macquarie System for Entity-Linking[C/OL]. [2017-08-21]. https://tac.nist.gov/publications/2009/participant.papers/DAMSEL.proceedings.pdf. [3] BIKEL D, CASTELLI V, FLORIAN R, et al. Entity Linking and Slot Filling through Statistical Processing and Inference Rules[C/OL]. [2017-08-21]. https://tac.nist.gov/publications/2009/participant.papers/IBM.proceedings.pdf. [4] BUNESCU R, PASCA M. Using Encyclopedic Knowledge for Named Entity Disambiguation // Proc of the 11st Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2006: 9-16. [5] NGUYEN H T, CAO T H. Named Entity Disambiguation on an Ontology Enriched by Wikipedia // Proc of the IEEE International Conference on Research, Innovation and Vision for the Future in Computing & Communication Technologies. Washington, USA: IEEE, 2008: 247-254. [6] NGUYEN H T, CAO T H. Exploring Wikipedia and Text Features for Named Entity Disambiguation // Proc of the 2nd International Conference on Intelligent Information and Database Systems. New York, USA: Springer, 2010, II: 11-20. [7] KALASHNIKOV D V, NURAY-TURAN R, MEHROTRA S. Towards Breaking the Quality Curse: A Web-Querying Approach to Web People Search // Proc of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2008: 27-34. [8] HAN X P, ZHAO J. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge // Proc of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009: 215-224. [9] FRANCIS-LANDAU M, DURRETT G, KLEIN D. Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: ACL, 2016: 1256-1261. [10] DURRETT G, KLEIN D. A Joint Model for Entity Analysis: Coreference, Typing, and Linking [C/OL]. [2017-08-21]. https://people.eecs.berkeley.edu/~gdurrett/papers/durrett-klein-tacl2014.pdf. [11] KHAPRA M M, SHAH S, KEDIA P, et al. Domain-Specific Word Sense Disambiguation Combining Corpus Based and Wordnet Based Parameters[C/OL]. [2017-08-21]. https://www.cse.iitb.ac.in/~pb/papers/gwc2010-english-wsd.pdf. [12] 怀宝兴,宝腾飞,祝恒书,等.一种基于概率主题模型的命名实体链接方法.软件学报, 2014, 25(9): 2076-2087. (HUAI B X, BAO T F, ZHU H S, et al. Topic Modeling App-roach to Named Entity Linking. Journal of Software, 2014, 25(9): 2076-2087.) [13] HAN X P, SUN L. A Generative Entity-Mention Model for Linking Entities with Knowledge Base // Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: ACL, 2011: 945-954. [14] 冯 冲,石 戈,郭宇航,等.基于词向量语义分类的微博实体链接方法.自动化学报, 2016, 42(6): 915-922. (FENG C, SHI G, GUO Y H, et al. An Entity Linking Method for Microblog Based on Semantic Categorization by Word Embe-ddings. Acta Automatica Sinica, 2016, 42(6): 915-922.) [15] 张 涛,刘 康,赵 军.一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用.中文信息学报, 2015, 29(2): 58-67. (ZHANG T, LIU K, ZHAO J. A Graph-Based Similarity Measure between Wikipedia Concepts and Its Application in Entity Linking System. Journal of Chinese Information Processing, 2015, 29(2): 58-67.) [16] 吴运兵,朱丹红,廖祥文,等.路径张量分解的知识图谱推理算法.模式识别与人工智能, 2017, 30(5): 473-480. (WU Y B, ZHU D H, LIAO X W, et al. Knowledge Graph Reasoning Based on Paths of Tensor Factorization. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 473-480.) [17] 曾 琦,周 刚,兰明敬,等.一种多义词词向量计算方法.小型微型计算机系统, 2016, 37(7): 1417-1421. (ZENG Q, ZHOU G, LAN M J, et al. Polysemous Word Multi-embedding Calculation. Journal of Chinese Computer Systems, 2016, 37(7): 1417-1421.) [18] HARTIGAN J A, WONG M A. Algorithm AS 136: A K-means Clustering Algorithm. Journal of the Royal Statistical Society(App-lied Statistics), 1979, 28(1): 100-108. [19] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J/OL]. [2017-08-21]. https://arxiv.org/pdf/1301.3781.pdf. [20] GOLDBERG Y, LEVY O. Word2vec Explained: Deriving Mikolov Negative-Sampling Word-Embedding Method[J/OL]. [2017-08-21]. https://arxiv.org/pdf/1402.3722.pdf. [21] 杨 安,李素建,李 芸.基于领域知识和词向量的词义消歧方法.北京大学学报(自然科学版), 2017, 53(2): 204-210. (YANG A, LI S J, LI Y. Word Sense Disambiguation Based on Domain Knowledge and Word Vector Model. Acta Scientiarum Na-turalium Universitatis Pekinensis, 2017, 53(2): 204-210.) [22] HACHERY B, RADFORD W, NOTHMAN J, et al. Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 2013, 194: 130-150. [23] CUCERZAN S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data // Proc of the Joint Conference on Empirical Methods in Natural Language Proceeding and Computational Natural Language Learning. Stroudsburg, USA: ACL, 2007: 708-716. |
|
|
|