模式识别与人工智能
Tuesday, Apr. 22, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2010, Vol. 23 Issue (6): 847-855    DOI:
Orignal Article Current Issue| Next Issue| Archive| Adv Search |
Web Information Extraction Based on Probabilistic Model
WANG Jing,LIU Zhi-Jing
School of Computer Science and Engineering,Xidian University,Xian 710071

Download: PDF (575 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  According to the structure and the content features of web pages, a model named tree-structured hierarchical conditional random fields (TH-CRFs) is proposed. Firstly, a multi-feature vector space model is proposed to represent the features of the web pages from the facets of the page structure and the content. Secondly, the Boolean model and multi-rules are introduced to denote the features for a better representation of the web objects. Thirdly, an optimal web objects information extraction based on the TH-CRFs is performed to find out the recruitment knowledge and optimize the efficiency of the training. Finally, the proposed model is compared with the existing approaches for web objects information extraction. The experimental results show that the accuracy of the TH-CRFs for the web objects information extraction is significantly improved, and the time complexity is decreased.
Key wordsWeb Object      Conditional Random Fields (CRFs)      Information Extraction (IE)     
Received: 17 August 2009     
ZTFLH: TP391  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WANG Jing
LIU Zhi-Jing
Cite this article:   
WANG Jing,LIU Zhi-Jing. Web Information Extraction Based on Probabilistic Model[J]. , 2010, 23(6): 847-855.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2010/V23/I6/847
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn