基于遗传算法的Web信息抽取

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (395 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要 WHISK系统是一个半自动的IE系统，对结构化、半结构化的Web文本它都能使用生成的抽取规则进行信息抽取．但是它在规则学习过程中规则不能保证以最优的方式进行扩展，且生成规则集的时间较长．文中主要针对这些问题，提出利用遗传算法改进WHISK的监督式学习算法，并采用移除法生成规则集．实验结果表明此方法在效率和召回率上都得到提高．

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	郭银蕊
	陈荣

关键词 ：信息抽取, WHISK系统, 遗传算法, 规则学习

Abstract：WHISK system is a semi automatic information extraction (IE) system. It works well in extracting information for structured or semi structured web texts. However, but there is no guarantee that the rule learning algorithm can extend rules in an optimal way. Besides, the generation of rule set is time consuming. To solve these problems, the genetic algorithm is introduced to improve the supervised machine learning algorithm WHISK by a heuristic rule expansion, and a removing method is used to generate the rule set. The experimental results show that the proposed algorithm performs well in terms of the efficiency and the recall rate.

Key words： Information Extraction WHISK System Genetic Algorithm Rule Learning

ZTFLH:

TP 181

引用本文:

郭银蕊, 陈荣. 基于遗传算法的Web信息抽取[J]. 模式识别与人工智能, 2011, 24(3): 385-390. GUO Yin-Rui, CHEN Rong. Web Information Extraction Based on Genetic Algorithm. , 2011, 24(3): 385-390.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2011/V24/I3/385