模式识别与人工智能
2025年4月4日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2011, Vol. 24 Issue (1): 130-137    DOI:
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于格空间的受限DeepWeb数据抽取算法
张卓1,李石君1,张乃洲1,2,田建伟1
1.武汉大学计算机学院武汉430079
2.湖北大学知行学院计算机科学系武汉430072
Data Extraction from Limited Deep Web Based on Latticial Space
ZHANG Zhuo1, LI Shi-Jun1, ZHANG Nai-Zhou1,2, TIAN Jian-Wei1
1.School of Computer, Wuhan University, Wuhan 430079
2.Department of Computer Science, Zhixing College of HuBei University, Wuhan 430072

全文: PDF (542 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 将返回结果受限的DeepWeb数据源中预测查询结果大小并且抽取的问题转化为概念覆盖问题。首先证明由属性及属性组合产生的集合划分之间为容差关系,进而又证明其构成一个完全格,并且与概念格同态。使用概念间的偏序关系来刻画属性间的相关性,使用概念内涵为查询属性,概念外延为返回结果的预测,基于外延的势剪枝后的概念格为搜索空间,最终提出一种基于格空间的DeepWeb数据抽取算法。实验由可控实验和实际应用实验组成,结果证明该算法理论正确性和现实应用的可行性及有效性。
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
张卓
李石君
张乃洲
田建伟
关键词 数据抽取容差关系形式概念分析概念格    
Abstract:In the situation of crawling Deep Web database that limits the number of results, the problem of appropriately predicting the results size of queries can be modeled as a set covering problem with condition of limited set size. This problem is modeled as a concept covering problem. Firstly, the relation among all couples composed by a query and its result is proved as tolerance. Secondly, set of them is proved as a complete lattice which is homomorphism to the concept lattice from the same source. Therefore, the order relation between concepts can be utilized to describe correlation between queries. The intent of a concept can be considered as a query, thus the result size is forecasted by cardinality of the concept extent. A lattice-based algorithm is proposed for data extraction from limited Deep Web database, called Ladeldew. Semi-lattice pruned based on the cardinality of extent is exploited by Ladeldew as search space. The new search space is iteratively generated from new data until nothing can be extracted. Both controlled and real experiments are implemented to evaluate Ladeldew, and the results verify its theoretical correction and realistic application.
Key wordsData Extraction    Tolerance Relation    Formal Concept Analysis    Concept Lattice   
收稿日期: 2009-12-16     
ZTFLH: TP311.1  
基金资助:国家自然科学基金资助项目(No.60970018)
作者简介: 张卓,男,1978年生,博士研究生,主要研究方向为概念格、Web数据挖掘和数据抽取.E-mail:charles.zz@gmail.com.李石君,男,1964年生,教授,博士生导师,主要研究方向为Web数据管理.E-mail:shjLi@whu.edu.cn.张乃洲,男,1970年生,博士研究生,讲师,主要研究方向为Web信息检索、非结构化数据管理.田建伟,男,1982年生,博士研究生,主要研究方向为Web挖掘、信息集成.
引用本文:   
张卓,李石君,张乃洲,田建伟. 基于格空间的受限DeepWeb数据抽取算法[J]. 模式识别与人工智能, 2011, 24(1): 130-137. ZHANG Zhuo, LI Shi-Jun, ZHANG Nai-Zhou, TIAN Jian-Wei. Data Extraction from Limited Deep Web Based on Latticial Space. , 2011, 24(1): 130-137.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2011/V24/I1/130
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn