模式识别与人工智能
Friday, Apr. 11, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2017, Vol. 30 Issue (6): 559-568    DOI: 10.16451/j.cnki.issn1003-6059.201706009
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Multilingual Documents Clustering Algorithm Based on Parallel Information Bottleneck
YAN Xiaoqiang, LU Yaoen, LOU Zhengzheng, YE Yangdong
School of Information Engineering, Zhengzhou University, Zhengzhou 450052

Download: PDF (640 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  The potential complementation between different languages is ignored while traditional clustering algorithms discover the hidden structures in document collection. Thus, the latent information in the collection can not be reflected by the obtained patterns. Aiming at this problem, multilingual document clustering algorithm based on parallel information bottleneck(ML-IB) is proposed. Firstly, the relevant variables of multiple language information are constructed according to the bag-of-words model. Then,the multiple relevant variables are incorporated into the parallel information bottleneck, and the relevant information between data patterns and multiple relevant variables is preserved maximally. Finally, to optimize the objective function of ML-IB, a draw and merge method based on information theory is proposed to guarantee the convergence of ML-IB to a local optimal solution. Extensive experimental results on multilingual document datasets show that the proposed algorithm significantly outperform the state-of-the-art single and multilingual clustering methods.
Key wordsParallel Information Bottleneck      Multilingual      Document Clustering      Information     
Received: 26 September 2016     
ZTFLH: TP 391.4  
Fund:Supported by National Natural Science Foundation of China(No.61502434,61502432,61170223)
About author:: (YAN Xiaoqiang, born in 1989, Ph.D. candidate. His research interests include machine learning, pattern recognition and computer vision.)
(LU Yaoen,born in 1989,master student. His research interests include pattern recognition and data mining.)
(LOU Zhengzheng, born in 1984, Ph.D.,associate professor. His research interests include machine learning,pattern recognition and data mining. )
(YE Yangdong(Corresponding author), born in 1962, Ph.D., professor. His research interests include intellectual system,database and machine learning.)
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
YAN Xiaoqiang
LU Yaoen
LOU Zhengzheng
YE Yangdong
Cite this article:   
YAN Xiaoqiang,LU Yaoen,LOU Zhengzheng等. Multilingual Documents Clustering Algorithm Based on Parallel Information Bottleneck[J]. , 2017, 30(6): 559-568.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201706009      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I6/559
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn