An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features
ZHAO Peng1,2, CAI QingSheng1,WANG QingYi1, GENG HuanTong1
1.Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027 2.Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230039
Abstract:Automatic keyword extraction is one of the most important techniques in natural language processing. In this paper, features of complex networks composed of Chinese are studied. A novel automatic keyword extraction algorithm for Chinese document is proposed which is based on the features of the complex networks according to the small world structure in language networks and the theoretical achievements in complex networks. It extracts keyword based on the feature values of the word nodes in a documental language network. Experimental results show the proposed algorithm obtains higher average precision compared with the keyword extraction algorithm based on TFIDF.
[1] Yao Xin. Studies on Complex Networks and Its Clustering Degree. Ph.D Dissertation. Beijing, China: Tsinghua University. Department of Automation, 2005 (in Chinese) (姚 欣.复杂网络及其聚集度研究.博士学位论文.北京:清华大学.自动化系,2005) [2] FerreriCancho R, Sole R V. The Small World of Human Language // Proc of the Royal Society of London, Series B: Biological Sciences, 2001, 268(1482): 22612265 [3] Watts D J. Small Worlds. Princeton, USA: Princeton University Press, 1999 [4] Matsuo Y, Ohsawa Y, Ishizuka M. A Document as a Small World // Terano T, Nishida T, Namatame A, et al. eds. Lecture Notes in Computer Science. London, UK: SpringerVerlag, 2253: 444448 [5] Zhu Mengxiao, Cai Zhi, Cai Qingsheng. Automatic Keywords Extraction of Chinese Document Using Small World Structure // Proc of the International Conference on Natural Language Processing and Knowledge Engineering. Beijing, China, 2003: 438443 [6] Wei Luoxia, Li Yong, Li Wei, et al. Three Degrees of Separation of Chinese Networks and Small World Effect. Chinese Science Bulletin, 2004, 49(24): 26152616 (in Chinese) (韦络霞,李 勇,李 伟,等.汉字网络的3度分隔与小世界效应. 科学通报,2004, 49(24): 26152616) [7] Turney P D. Learning to Extract Keyphrases from Text. Technical Report, ERB1057, Ottawa, Canada: National Research Council Canada. Institute for Information Technology, 1999 [8] Witten I H, Paynter G W, Frank E, et al. KEA: Practical Automatic Keyphrase Extraction // Proc of the 4th ACM Conference on Digital Libraries. Berkeley, USA, 1999: 254255 [9] Lee D L, Chuang H, Seamons K. Document Ranking and the VectorSpace Model. IEEE Software, 1997, 14(2): 6775