模式识别与人工智能
Friday, Apr. 4, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2017, Vol. 30 Issue (1): 43-53    DOI: 10.16451/j.cnki.issn1003-6059.201701005
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Incremental Deep Web Crawling with Top-k Query Constraint
JIANG Junyan1,2, PENG Zhiyong1,2, WU Xiaoying1
1. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072
2. School of Computer, Wuhan University, Wuhan 430072

Download: PDF (892 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Crawling all deep web data is difficult for third party applications due to dynamicity, autonomy and quantity of deep web data sources. To tackle the deep web crawling problem under the query type restriction(only top-k queries are allowed) and limited query resources, an approach for incremental web crawling with top-k query constraint is proposed. Historical data and domain knowledge are combined to maximize total repository data quality. Firstly, valid queries are generated using a query tree, and changes and corresponding cost of the query are estimated by historical data and domain knowledge. Next, grounded on the query cost and data quality of the estimation, the optimal subset is selected approximately to globally maximize total data quality under limited query resources. The experimental results on real datasets show the proposed approach improves the efficiency of crawling dynamic web database.
Key wordsTop-k Query      Web Database Crawling      Data Quality      Query Cost      Query Selection     
Received: 10 September 2016     
ZTFLH: TP 311  
About author:: JIANG Junyan, born in 1987, Ph.D. candidate. His research interests include Web data management.PENG Zhiyong, born in 1963, Ph.D., professor. His research interests include complex data management, trusted data management and Web data management.WU Xiaoying(Corresponding author), born in 1973, Ph. D., associate professor. Her research interests include data management, query processing and optimization, keyword query, pattern mining, semantic web, and data integration.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
JIANG Junyan
PENG Zhiyong
WU Xiaoying
Cite this article:   
JIANG Junyan,PENG Zhiyong,WU Xiaoying. Incremental Deep Web Crawling with Top-k Query Constraint[J]. , 2017, 30(1): 43-53.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201701005      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I1/43
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn