HE Yulin1,2, ZHU Penghui2, HUANG Zhexue1,2, PHILIPPE Fournier-Viger2
1. Guangdong Laboratory of Artificial Intelligence and Digital Eco-nomy(Shenzhen), Shenzhen 518107; 2. College of Computer Science and Software Engineering, Shen-zhen University, Shenzhen 518060
Abstract:The existing semi-supervised ensemble learning algorithms commonly encounter the issue of information confusion in predicting unlabeled samples. To address this issue, a classification risk-based semi-supervised ensemble learning(CR-SSEL) algorithm is proposed. Classification risk is utilized as the criterion for evaluating the confidence of unlabeled samples. It can measure the degree of sample uncertainty effectively. By iteratively training classifiers and restrengthening the high confidence samples,the uncertainty of sample labeling is reduced and thus the classification performance of SSEL is enhanced. The impacts of learning parameters, training process convergence and improvement of generalization capability of CR-SSEL algorithm are verified on multiple standard datasets. The experimental results demonstrate that CR-SSEL algorithm presents the convergence trend of training process with an increase in the number of base classifiers and it achieves better classification accuracy.
作者简介: 朱鹏辉,硕士研究生,主要研究方向为数据挖掘、机器学习.E-mail:1007435023@qq.com. 黄哲学,博士,特聘教授,主要研究方向为数据挖掘、机器学习、大数据系统计算技术.E-mail:zx.huang@szu.edu.cn. PHILIPPE Fournier-Viger,博士,特聘教授,主要研究方向为数据挖掘、人工智能、知识表示和推理、认知模型建构等.E-mail:philfv@szu.edu.cn.
引用本文:
何玉林, 朱鹏辉, 黄哲学, PHILIPPE Fournier-Viger. 基于分类风险的半监督集成学习算法[J]. 模式识别与人工智能, 2024, 37(4): 339-351.
HE Yulin, ZHU Penghui, HUANG Zhexue, PHILIPPE Fournier-Viger. Classification Risk-Based Semi-supervised Ensemble Learning Algorithm. Pattern Recognition and Artificial Intelligence, 2024, 37(4): 339-351.
[1] 蔡毅,朱秀芳,孙章丽,等.半监督集成学习综述.计算机科学, 2017, 44(6A): 7-13. (CAI Y, ZHU X F, SUN Z L, et al. Semi-supervised and Ensemble Learning: A Review. Computer Science, 2017, 44(6A): 7-13.) [2] 郭文忠,姚杰,王石平.多视角半监督分类算法比较及研究进展.福州大学学报(自然科学版), 2021, 49(5): 626-637. (GUO W Z, YAO J, WANG S P. Comparison and Research Progress of Multi-view Semi-supervised Classification Algorithms. Journal of Fuzhou University(Natural Science Edition), 2021, 49(5): 626-637.) [3] 韩嵩,韩秋弘.半监督学习研究的述评.计算机工程与应用, 2020, 56(6): 19-27. (HAN S, HAN Q H. Review of Semi-supervised Learning Research. Computer Engineering and Applications, 2020, 56(6): 19-27. [4] VAN ENGELEN J E, HOOS H H. A Survey on Semi-supervised Learning. Machine Learning, 2020, 109(2): 373-440. [5] SONG Z X, YANG X L, XU Z L, et al. Graph-Based Semi-supervised Learning: A Comprehensive Review. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8174-8194. [6] DONG X B, YU Z W, CAO W M, et al. A Survey on Ensemble Learning. Frontiers of Computer Science, 2020, 14: 241-258. [7] SAGI O, ROKACH L. Ensemble Learning: A Survey. WIREs: Data Mining and Knowledge Discovery, 2018, 8(4). DOI: 10.1002/widm.1249. [8] ZHOU Z H. When Semi-supervised Learning Meets Ensemble Lear-ning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6: 6-16. [9] 周志华. 基于分歧的半监督学习.自动化学报, 2013, 39(11): 1871-1878. (ZHOU Z H. Disagreement-Based Semi-supervised Learning. Acta Automatica Sinica, 2013, 39(11): 1871-1878.) [10] BLUM A, MITCHELL T. Combining Labeled and Unlabeled Data with Co-training // Proc of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998: 92-100. [11] NING X, WANG X R, XU S H, et al. A Review of Research on Co-training. Concurrency and Computation: Practice and Experience, 2023, 35(18). DOI: 10.1002/cpe.6276. [12] WANG J, LUO S W, ZENG X H. A Random Subspace Method for Co-training // Proc of the IEEE International Joint Conference on Neural Networks. Washington, USA: IEEE, 2008: 195-200. [13] YASLAN Y, CATALTEPE Z. Co-training with Relevant Random Subspaces. Neurocomputing, 2010, 73(10/11/12): 1652-1661. [14] 盛小春,岳晓冬.基于粗糙集理论的协同训练算法.计算机应用研究, 2013, 30(12): 3546-3550. (SHENG X C, YUE X D. Novel Co-training Algorithm Based on Rough Sets. Application Research of Computers, 2013, 30(12): 3546-3550.) [15] KIM D, SEO D, CHO S, et al. Multi-co-training for Document Classification Using Various Document Representations: TF-IDF, LDA, and Doc2Vec. Information Sciences, 2019, 477: 15-29. [16] 陈圣楠,范新民.基于多视图半监督集成学习的人体动作识别算法.网络与信息安全学报, 2021, 7(3): 141-148. (CHEN S N, FAN X M. Human Action Recognition Method Based on Multi-view Semi-supervised Ensemble Learning. Chinese Journal of Network and Information Security, 2021, 7(3): 141-148.) [17] KE G L, MENG Q, FINLEY T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree // Proc of the 31st Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 3149-3157. [18] WANG W, ZHOU Z H. Analyzing Co-training Style Algorithms // Proc of the European Conference on Machine Learning. Berlin, Germany: Springer, 2007: 454-465. [19] ZHOU Z H, LI M. Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541. [20] BREIMAN L. Bagging Predictors. Machine Learning, 1996, 24: 123-140. [21] LIVIERIS I E, KANAVOS A, TAMPAKAS V, et al. An Ensemble SSL Algorithm for Efficient Chest X-Ray Image Classification. Journal of Imaging, 2018, 4(7). DOI: 10.3390/jimaging4070095. [22] YAROWSKY D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods // Proc of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 1995: 189-196. [23] DE VRIES S, THIERENS D. A Reliable Ensemble Based App-roach to Semi-supervised Learning. Knowledge-Based Systems, 2021, 215. DOI: 10.1016/j.knosys.2021.106738. [24] MARAK D C B, HALDER A, KUMAR A. Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data. New Generation Computing, 2021, 39(3/4): 487-513. [25] COVER T, HART P. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. [26] CORTES C, VAPNIK V. Support-Vector Networks. Machine Lear-ning, 1995, 20: 273-297. [27] BREIMAN L. Random Forests. Machine Learning, 2001, 45: 5-32. [28] WU J, SANG X K, CUI W. Semi-supervised Collaborative Filte-ring Ensemble. World Wide Web, 2021, 24: 657-673. [29] 赵静,李俊,龙春,等.基于集成SVM和Bagging的未知恶意流量检测.计算机系统应用, 2022, 31(10): 51-59. (ZHAO J, LI J, LONG C, et al. Unknown Malicious Traffic Detection Based on Integrated SVM and Bagging. Computer Systems and Applications, 2022, 31(10): 51-59.) [30] COHEN A C. Estimation in Mixtures of Two Normal Distributions. Technometrics, 1967, 9(1): 15-28. [31] 王轶初. 基于集成学习的半监督学习算法研究.硕士学位论文.西安:西安电子科技大学, 2011. (WANG Y C. Research on Semi-supervised Learning Algorithms Based on Ensemble Learning. Master Dissertation. Xi'an, China: Xidian University, 2011.) [32] LI Y Y, SU L, CHEN J, et al. Semi-supervised Learning for Ques-tion Classification in CQA. Natural Computing, 2017, 16: 567-577.