一种基于复杂网络特征的中文文档关键词抽取算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (5)

全文: PDF (394 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要关键词抽取是自然语言理解领域中的重要技术之一.本文研究汉语语言所组成的自然语言网络中的复杂网络特性，并根据语言网络中的“小世界”特性和近两年复杂网络研究中部分新的理论成果，提出基于复杂网络特征的中文文档关键词抽取算法.该算法根据文档语言网络中单词结点的复杂网络特征值进行关键词抽取.实验结果表明，本文算法抽取关键词所获得的平均准确率要高于TFIDF关键词抽取算法所获得的平均准确率.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	赵鹏
	蔡庆生
	王清毅
	耿焕同

关键词 ：复杂网络, 语言网络, 关键词抽取

Abstract：Automatic keyword extraction is one of the most important techniques in natural language processing. In this paper, features of complex networks composed of Chinese are studied. A novel automatic keyword extraction algorithm for Chinese document is proposed which is based on the features of the complex networks according to the small world structure in language networks and the theoretical achievements in complex networks. It extracts keyword based on the feature values of the word nodes in a documental language network. Experimental results show the proposed algorithm obtains higher average precision compared with the keyword extraction algorithm based on TFIDF.

Key words： Complex Network Language Networks Keywords Extraction

收稿日期: 2006-05-08

ZTFLH:

TP181

基金资助:国家自然科学基金(No.70171052)、安徽省自然科学基金(No.2004kj011)资助项目

作者简介: 赵鹏，女，1976年生，博士，主要研究方向为人工智能、机器学习、复杂网络.Email:zhp2004@mail.ustc.edu.cn.蔡庆生，男，1938年生，教授，博士生导师，主要研究方向为人工智能、机器学习、复杂系统.王清毅，男，1962年生，博士，主要研究方向为人工智能、机器学习、知识发现.耿焕同，男，1973年生，博士研究生，主要研究方向为人工智能、知识发现.

引用本文:

赵鹏，蔡庆生，王清毅，耿焕同. 一种基于复杂网络特征的中文文档关键词抽取算法^*[J]. 模式识别与人工智能, 2007, 20(6): 827-831. ZHAO Peng , CAI QingSheng ,WANG QingYi , GENG HuanTong. An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features. , 2007, 20(6): 827-831.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2007/V20/I6/827