Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models |
MA Xiaojun1, GUO Jianyi1,2, WANG Hongbin1,2, ZHANG Zhikun1,2, XIAN Yantuan1,2, YU Zhengtao1,2 |
1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500 2.Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500 |
Abstract When the Skip-gram word vector model deals with the polysemous words, only one word vector with mixed multiple semantics can be computed and different meanings of polysemous words can not be distinguished. In this paper, an entity disambiguation method combining the word vector and the topic model in specific domains is proposed. The word vector method is used to obtain the vector form of the reference term and the candidate entity from the background text and the knowledge base, respectively. The similarities of the context and the category reference are calculated, and the LDA topic model and the Skip-gram word vector models are used to obtain the word vector representation of different meanings of the polysemous words. Meanwhile, the domain keywords are extracted and then the domain topic keyword similarity are calculated. Finally, three types of features are combined, and the candidate entity with the highest similarity is selected as the final target entity. Experiments show that the proposed method has better disambiguation results than the existing disambiguation methods.
Received: 15 September 2017
Fund:Supported by National Natural Science Foundation of China(No.61562052,61462054,61363044) |
About author:: (MA Xiaojun, born in 1991, master stu-dent. His research interests include natural language processing and knowledge representation.) (GUO Jianyi(Corresponding author), born in 1964, master, professor. Her research interests include pattern recognition, natural language processing, information extraction and knowledge acquisition.) (WANG Hongbin, born in 1983, Ph.D., lecturer. His research interests include intelligent information system, natural language processing and information retrieval.) (ZHANG Zhikun, born in 1977, master, lecturer. His research interests include machine translation, information retrieval and information extraction.) (XIAN Yantuan, born in 1981, Ph.D. candidate, lecturer. His research interests include machine translation, information retrie-val and information extraction.) (YU Zhengtao, born in 1970, Ph.D., professor. His research interests include machine translation, natural language processing and information retrieval.) |
