模式识别与人工智能
2025年4月3日 星期四   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2024, Vol. 37 Issue (8): 741-754    DOI: 10.16451/j.cnki.issn1003-6059.202408007
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于核极限学习机的多标签数据流半监督在线分类方法
王雨晨1,2, 邱士远1,2, 李培培1,2,3, 胡学钢1,2,4
1.合肥工业大学 计算机与信息学院 合肥 230601;
2.合肥工业大学 大数据知识工程教育部重点实验室 合肥 230009;
3.合肥综合性国家科学中心大健康研究院 健康大数据与群体医学研究所 合肥 230032;
4.合肥工业大学 安徽省工业安全与应急技术重点实验室 合肥 230009
Semi-supervised Online Classification Method for Multi-label Data Stream Based on Kernel Extreme Learning Machine
WANG Yuchen1,2, QIU Shiyuan1,2, LI Peipei1,2,3, HU Xuegang1,2,4
1. School of Computer Science and Information Engineering, He-fei University of Technology, Hefei 230601;
2. Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education of China, Hefei University of Technology, Hefei 230009;
3. Institute of Health Big Data and Population Medicine, Institute of Health and Medicine, Hefei Comprehensive National Science Center, Hefei 230032;
4. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei University of Technology, Hefei 230009

全文: PDF (853 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 实际应用中涌现的大量流数据具有高速到达、海量、动态变化等特点,同时,这些数据流常含有多个标签且只有少量数据被标记,从而带来多标签数据环境下的概念漂移与标签缺失问题.为此,文中提出基于核极限学习机的多标签数据流半监督在线分类方法.首先,针对多标签数据流的标签缺失问题,根据滑动窗口将数据流划分为k块,对每块数据构造特征相似性矩阵和标签相似性矩阵,并加入核极限学习机的训练中.同时为了适应流数据的特点,设计增量式更新机制,构建半监督在线核极限学习机.然后,为了适应数据流中的概念漂移问题,采用基于时间戳丢弃更新的机制,预先设定数据规模,当数据到达指定规模后,丢弃最旧的无标签数据,将新的数据加入更新.最后,在10个多标签数据集上的实验表明,文中方法对标签缺失和概念漂移问题具有较强的适应能力,并能保持较优的分类效果.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
王雨晨
邱士远
李培培
胡学钢
关键词 数据流分类半监督分类多标签分类核极限学习机概念漂移    
Abstract:In practical applications, a large amount of streaming data emerges, and it is characterized of high arrival speed, massive volume and dynamic variation. Moreover, the data streams often contain multiple labels but only a small amount of data in the streams is labeled, causing the problems of concept drift and label missing in the multi-label data. To solve these problems, a semi- supervised online classification method for multi-label data stream based on kernel extreme learning machine is proposed in this paper. Firstly, the data stream is divided into k blocks according to the sliding window to tackle the label missing problem in multi-label data stream. A feature similarity matrix and a label similarity matrix are constructed for each piece of data and they are added to the training of kernel extreme learning machine model. An incremental update mechanism is designed to construct a semi-supervised online kernel extreme learning machine to adapt to the characteristics of streaming data. Secondly, to address the issue of the concept drift problem in data stream, the timestamp mechanism is adopted for discarding update. The data size is preset in advance. When the data reaches the specified size, the oldest unlabeled data is discarded and new data is added for updating. Finally, experiments on 10 multi-label datasets demonstrate that the proposed method possesses strong adaptability to the problems of label missing and concept drift, while maintaining good classification performance.
Key wordsData Stream Classification    Semi-supervised Classification    Multi-label Classification    Kernel Extreme Learning Machine    Concept Drift   
收稿日期: 2024-06-15     
ZTFLH: TP181  
基金资助:国家自然科学基金项目(No.62376085,62076085,62120106008)、合肥综合性国家科学中心大健康研究院健康大数据与群体医学研究所专项资金项目(No.JKS20230030)资助
通讯作者: 李培培,博士,教授,主要研究方向为数据挖掘.E-mail:peipeili@hfut.edu.cn.   
作者简介: 王雨晨,硕士研究生,主要研究方向为半监督多标签数据流分类.E-mail:2023110549@mail.hfut.edu.cn. 邱士远,硕士,主要研究方向为半监督多标签数据流分类.E-mail:2020171138@mail.hfut.edu.cn. 胡学钢,博士,教授,主要研究方向为数据挖掘、知识工程.E-mail:jsjxhuxg@hfut.edu.cn.
引用本文:   
王雨晨, 邱士远, 李培培, 胡学钢. 基于核极限学习机的多标签数据流半监督在线分类方法[J]. 模式识别与人工智能, 2024, 37(8): 741-754. WANG Yuchen, QIU Shiyuan, LI Peipei, HU Xuegang. Semi-supervised Online Classification Method for Multi-label Data Stream Based on Kernel Extreme Learning Machine. Pattern Recognition and Artificial Intelligence, 2024, 37(8): 741-754.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202408007      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I8/741
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn