模式识别与人工智能
Friday, Apr. 4, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2016, Vol. 29 Issue (8): 673-681    DOI: 10.16451/j.cnki.issn1003-6059.201608001
Papers and Reports Current Issue| Next Issue| Archive| Adv Search |
A Multi-record Webpage Attribute Extraction Method Combining Active Learning
WEI Jingjing1,2, LIAO Xiangwen3,4, CHEN Qiaoling3,4, MA Feixiang3,4, CHEN Guolong3,4
1.College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116
2.College of Electronics and Information Science, Fujian Jiangxia University, Fuzhou 350108
3.College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116
4.Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou University, Fuzhou 350116

Download: PDF (603 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  The attribute extraction process can be separated into two phases, alignment and annotation. In the existing alignment methods, different semantic attributes are mistakenly aligned into the same group. Furthermore, to improve the accuracy of semantic annotation, time-consuming manual annotation is often introduced to construct training set. To solve this problem, a multi-record webpage attribute extraction method combining active learning is presented. As for the problem of wrong attribute alignment, shallow semantic is integrated into the alignment approach to relieve the influence of same tags with different semantics. In the semantic annotation phase, textual, visual and global features are extracted for semantic classification and an active learning based SVM classifier is applied to extract structural data. Moreover, a new sample selection strategy is proposed by introducing the global sample information, and more informative samples with lower confidences are selected to be labeled. The experimental results on BBS and microblog datasets confirm the superiority the proposed method.
Key wordsAttribute Extraction      Semantic Classification      Active Learning     
Received: 02 February 2015     
ZTFLH: TP 391  
Fund:Supported by Young Scientists Found of National Natural Science Foundation of China (No.61300105), Joint Ph.D. Programs Foundation of Ministry of Education of China (No.2012351410010), Key Project of Science and Technology of Fujian Province (No.2013H6012), Project of Science and Technology of Fuzhou (No.2013-PT-45,2012-G-113)
About author:: (WEI Jingjing, born in 1984, Ph.D. candidate. Her research interests include intelligent information processing.) (LIAO Xiangwen(Corresponding author), born in 1980, Ph.D., associate professor. His research interests include opi-nion mining and sentiment analysis.) (CHEN Qiaoling, born in 1989, master student. Her research interests include Web mining.) (MA Feixiang, born in 1991, master student. His research interests include sentiment analysis.)(CHEN Guolong, born in 1965, Ph.D., professor. His research interests include intelligent information processing.)
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WEI Jingjing
LIAO Xiangwen
CHEN Qiaoling
MA Feixiang
CHEN Guolong
Cite this article:   
WEI Jingjing,LIAO Xiangwen,CHEN Qiaoling等. A Multi-record Webpage Attribute Extraction Method Combining Active Learning[J]. , 2016, 29(8): 673-681.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201608001      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2016/V29/I8/673
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn