基于分类风险的半监督集成学习算法

doi:10.16451/j.cnki.issn1003-6059.202404005

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (900 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对当前半监督集成学习算法对无标记样本预测时容易出现的标注混沌问题,文中提出基于分类风险的半监督集成学习算法(Classification Risk-Based Semi-supervised Ensemble Learning Algorithm, CR-SSEL).采用分类风险作为无标记样本置信度的评判标准,可有效衡量样本标注的不确定性程度.迭代地训练分类器,对高置信度样本进行再强化,使样本标注的不确定性逐渐降低,增强半监督集成学习算法的分类性能.在多个标准数据集上验证CR-SSEL的学习参数影响、训练过程收敛和泛化性能提升,实验表明随着基分类器个数的增加,CR-SSEL的训练过程呈收敛趋势,获得较优的分类精度.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	何玉林
	朱鹏辉
	黄哲学
	PHILIPPE Fournier-Viger

关键词 ：半监督集成学习, 集成学习, 半监督学习, 分类风险, 不确定性, 置信度

Abstract：The existing semi-supervised ensemble learning algorithms commonly encounter the issue of information confusion in predicting unlabeled samples. To address this issue, a classification risk-based semi-supervised ensemble learning(CR-SSEL) algorithm is proposed. Classification risk is utilized as the criterion for evaluating the confidence of unlabeled samples. It can measure the degree of sample uncertainty effectively. By iteratively training classifiers and restrengthening the high confidence samples,the uncertainty of sample labeling is reduced and thus the classification performance of SSEL is enhanced. The impacts of learning parameters, training process convergence and improvement of generalization capability of CR-SSEL algorithm are verified on multiple standard datasets. The experimental results demonstrate that CR-SSEL algorithm presents the convergence trend of training process with an increase in the number of base classifiers and it achieves better classification accuracy.

Key words： Semi-supervised Ensemble Learning Ensemble Learning Semi-supervised Learning Classification Risk Uncertainty Confidence Degree

收稿日期: 2024-01-22

ZTFLH:

TN 911.73

基金资助:广东省自然科学基金面上项目(No.2023A1515011667)、广东省基础与应用基础研究基金粤深联合基金重点项目(No.2023B1515120020)、深圳市基础研究面上项目(No.JCYJ20210324093609026)

通讯作者: 何玉林,博士,研究员,主要研究方向为数据挖掘、机器学习、大数据系统计算技术.E-mail:yulinhe@gml.ac.cn.

作者简介: 朱鹏辉,硕士研究生,主要研究方向为数据挖掘、机器学习.E-mail:1007435023@qq.com. 黄哲学,博士,特聘教授,主要研究方向为数据挖掘、机器学习、大数据系统计算技术.E-mail:zx.huang@szu.edu.cn. PHILIPPE Fournier-Viger,博士,特聘教授,主要研究方向为数据挖掘、人工智能、知识表示和推理、认知模型建构等.E-mail:philfv@szu.edu.cn.

引用本文:

何玉林, 朱鹏辉, 黄哲学, PHILIPPE Fournier-Viger. 基于分类风险的半监督集成学习算法[J]. 模式识别与人工智能, 2024, 37(4): 339-351. HE Yulin, ZHU Penghui, HUANG Zhexue, PHILIPPE Fournier-Viger. Classification Risk-Based Semi-supervised Ensemble Learning Algorithm. Pattern Recognition and Artificial Intelligence, 2024, 37(4): 339-351.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202404005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I4/339

[1] 蔡毅,朱秀芳,孙章丽,等.半监督集成学习综述.计算机科学, 2017, 44(6A): 7-13.
(CAI Y, ZHU X F, SUN Z L, et al. Semi-supervised and Ensemble Learning: A Review. Computer Science, 2017, 44(6A): 7-13.)
[2] 郭文忠,姚杰,王石平.多视角半监督分类算法比较及研究进展.福州大学学报(自然科学版), 2021, 49(5): 626-637.
(GUO W Z, YAO J, WANG S P. Comparison and Research Progress of Multi-view Semi-supervised Classification Algorithms. Journal of Fuzhou University(Natural Science Edition), 2021, 49(5): 626-637.)
[3] 韩嵩,韩秋弘.半监督学习研究的述评.计算机工程与应用, 2020, 56(6): 19-27.
(HAN S, HAN Q H. Review of Semi-supervised Learning Research. Computer Engineering and Applications, 2020, 56(6): 19-27.
[4] VAN ENGELEN J E, HOOS H H. A Survey on Semi-supervised Learning. Machine Learning, 2020, 109(2): 373-440.
[5] SONG Z X, YANG X L, XU Z L, et al. Graph-Based Semi-supervised Learning: A Comprehensive Review. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8174-8194.
[6] DONG X B, YU Z W, CAO W M, et al. A Survey on Ensemble Learning. Frontiers of Computer Science, 2020, 14: 241-258.
[7] SAGI O, ROKACH L. Ensemble Learning: A Survey. WIREs: Data Mining and Knowledge Discovery, 2018, 8(4). DOI: 10.1002/widm.1249.
[8] ZHOU Z H. When Semi-supervised Learning Meets Ensemble Lear-ning. Frontiers of Electrical and Electronic Engineering in China, 2011, 6: 6-16.
[9] 周志华. 基于分歧的半监督学习.自动化学报, 2013, 39(11): 1871-1878.
(ZHOU Z H. Disagreement-Based Semi-supervised Learning. Acta Automatica Sinica, 2013, 39(11): 1871-1878.)
[10] BLUM A, MITCHELL T. Combining Labeled and Unlabeled Data with Co-training // Proc of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998: 92-100.
[11] NING X, WANG X R, XU S H, et al. A Review of Research on Co-training. Concurrency and Computation: Practice and Experience, 2023, 35(18). DOI: 10.1002/cpe.6276.
[12] WANG J, LUO S W, ZENG X H. A Random Subspace Method for Co-training // Proc of the IEEE International Joint Conference on Neural Networks. Washington, USA: IEEE, 2008: 195-200.
[13] YASLAN Y, CATALTEPE Z. Co-training with Relevant Random Subspaces. Neurocomputing, 2010, 73(10/11/12): 1652-1661.
[14] 盛小春,岳晓冬.基于粗糙集理论的协同训练算法.计算机应用研究, 2013, 30(12): 3546-3550.
(SHENG X C, YUE X D. Novel Co-training Algorithm Based on Rough Sets. Application Research of Computers, 2013, 30(12): 3546-3550.)
[15] KIM D, SEO D, CHO S, et al. Multi-co-training for Document Classification Using Various Document Representations: TF-IDF, LDA, and Doc2Vec. Information Sciences, 2019, 477: 15-29.
[16] 陈圣楠,范新民.基于多视图半监督集成学习的人体动作识别算法.网络与信息安全学报, 2021, 7(3): 141-148.
(CHEN S N, FAN X M. Human Action Recognition Method Based on Multi-view Semi-supervised Ensemble Learning. Chinese Journal of Network and Information Security, 2021, 7(3): 141-148.)
[17] KE G L, MENG Q, FINLEY T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree // Proc of the 31st Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 3149-3157.
[18] WANG W, ZHOU Z H. Analyzing Co-training Style Algorithms // Proc of the European Conference on Machine Learning. Berlin, Germany: Springer, 2007: 454-465.
[19] ZHOU Z H, LI M. Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541.
[20] BREIMAN L. Bagging Predictors. Machine Learning, 1996, 24: 123-140.
[21] LIVIERIS I E, KANAVOS A, TAMPAKAS V, et al. An Ensemble SSL Algorithm for Efficient Chest X-Ray Image Classification. Journal of Imaging, 2018, 4(7). DOI: 10.3390/jimaging4070095.
[22] YAROWSKY D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods // Proc of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 1995: 189-196.
[23] DE VRIES S, THIERENS D. A Reliable Ensemble Based App-roach to Semi-supervised Learning. Knowledge-Based Systems, 2021, 215. DOI: 10.1016/j.knosys.2021.106738.
[24] MARAK D C B, HALDER A, KUMAR A. Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data. New Generation Computing, 2021, 39(3/4): 487-513.
[25] COVER T, HART P. Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 1967, 13(1): 21-27.
[26] CORTES C, VAPNIK V. Support-Vector Networks. Machine Lear-ning, 1995, 20: 273-297.
[27] BREIMAN L. Random Forests. Machine Learning, 2001, 45: 5-32.
[28] WU J, SANG X K, CUI W. Semi-supervised Collaborative Filte-ring Ensemble. World Wide Web, 2021, 24: 657-673.
[29] 赵静,李俊,龙春,等.基于集成SVM和Bagging的未知恶意流量检测.计算机系统应用, 2022, 31(10): 51-59.
(ZHAO J, LI J, LONG C, et al. Unknown Malicious Traffic Detection Based on Integrated SVM and Bagging. Computer Systems and Applications, 2022, 31(10): 51-59.)
[30] COHEN A C. Estimation in Mixtures of Two Normal Distributions. Technometrics, 1967, 9(1): 15-28.
[31] 王轶初. 基于集成学习的半监督学习算法研究.硕士学位论文.西安:西安电子科技大学, 2011.
(WANG Y C. Research on Semi-supervised Learning Algorithms Based on Ensemble Learning. Master Dissertation. Xi'an, China: Xidian University, 2011.)
[32] LI Y Y, SU L, CHEN J, et al. Semi-supervised Learning for Ques-tion Classification in CQA. Natural Computing, 2017, 16: 567-577.