模式识别与人工智能
Friday, May. 2, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2017, Vol. 30 Issue (12): 1130-1137    DOI: 10.16451/j.cnki.issn1003-6059.201712009
Orignal Article Current Issue| Next Issue| Archive| Adv Search |
Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models
MA Xiaojun1, GUO Jianyi1,2, WANG Hongbin1,2, ZHANG Zhikun1,2, XIAN Yantuan1,2, YU Zhengtao1,2
1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500
2.Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500

Download: PDF (920 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  When the Skip-gram word vector model deals with the polysemous words, only one word vector with mixed multiple semantics can be computed and different meanings of polysemous words can not be distinguished. In this paper, an entity disambiguation method combining the word vector and the topic model in specific domains is proposed. The word vector method is used to obtain the vector form of the reference term and the candidate entity from the background text and the knowledge base, respectively. The similarities of the context and the category reference are calculated, and the LDA topic model and the Skip-gram word vector models are used to obtain the word vector representation of different meanings of the polysemous words. Meanwhile, the domain keywords are extracted and then the domain topic keyword similarity are calculated. Finally, three types of features are combined, and the candidate entity with the highest similarity is selected as the final target entity. Experiments show that the proposed method has better disambiguation results than the existing disambiguation methods.
Key wordsEntity Disambiguation      Word Vector Model      Domain Knowledge Base      Latent Dirichlet Allocations(LDA) Topic Model     
Received: 15 September 2017     
ZTFLH: TP 391  
Fund:Supported by National Natural Science Foundation of China(No.61562052,61462054,61363044)
About author:: (MA Xiaojun, born in 1991, master stu-dent. His research interests include natural language processing and knowledge representation.)
(GUO Jianyi(Corresponding author), born in 1964, master, professor. Her research interests include pattern recognition, natural language processing, information extraction and knowledge acquisition.)
(WANG Hongbin, born in 1983, Ph.D., lecturer. His research interests include intelligent information system, natural language processing and information retrieval.)
(ZHANG Zhikun, born in 1977, master, lecturer. His research interests include machine translation, information retrieval and information extraction.)
(XIAN Yantuan, born in 1981, Ph.D. candidate, lecturer. His research interests include machine translation, information retrie-val and information extraction.)
(YU Zhengtao, born in 1970, Ph.D., professor. His research interests include machine translation, natural language processing and information retrieval.)
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
MA Xiaojun
GUO Jianyi
WANG Hongbin
ZHANG Zhikun
XIAN Yantuan
YU Zhengtao
Cite this article:   
MA Xiaojun,GUO Jianyi,WANG Hongbin等. Entity Disambiguation in Specific Domains Combining Word Vector and Topic Models[J]. , 2017, 30(12): 1130-1137.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201712009      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I12/1130
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn