|
|
Semi-supervised Online Classification Method for Multi-label Data Stream Based on Kernel Extreme Learning Machine |
WANG Yuchen1,2, QIU Shiyuan1,2, LI Peipei1,2,3, HU Xuegang1,2,4 |
1. School of Computer Science and Information Engineering, He-fei University of Technology, Hefei 230601; 2. Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education of China, Hefei University of Technology, Hefei 230009; 3. Institute of Health Big Data and Population Medicine, Institute of Health and Medicine, Hefei Comprehensive National Science Center, Hefei 230032; 4. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei University of Technology, Hefei 230009 |
|
|
Abstract In practical applications, a large amount of streaming data emerges, and it is characterized of high arrival speed, massive volume and dynamic variation. Moreover, the data streams often contain multiple labels but only a small amount of data in the streams is labeled, causing the problems of concept drift and label missing in the multi-label data. To solve these problems, a semi- supervised online classification method for multi-label data stream based on kernel extreme learning machine is proposed in this paper. Firstly, the data stream is divided into k blocks according to the sliding window to tackle the label missing problem in multi-label data stream. A feature similarity matrix and a label similarity matrix are constructed for each piece of data and they are added to the training of kernel extreme learning machine model. An incremental update mechanism is designed to construct a semi-supervised online kernel extreme learning machine to adapt to the characteristics of streaming data. Secondly, to address the issue of the concept drift problem in data stream, the timestamp mechanism is adopted for discarding update. The data size is preset in advance. When the data reaches the specified size, the oldest unlabeled data is discarded and new data is added for updating. Finally, experiments on 10 multi-label datasets demonstrate that the proposed method possesses strong adaptability to the problems of label missing and concept drift, while maintaining good classification performance.
|
Received: 15 June 2024
|
|
Fund:National Natural Science Foundation of China(No.62376085,62076085,62120106008), Research Funds of Center for Big Data and Population Health of Institute of Health and Medicine of Hefei Comprehensive National Science Center(No.JKS2023003) |
Corresponding Authors:
LI Peipei, Ph.D., professor. Her research interests include data mining.
|
About author:: WANG Yuchen, Master student. His research interests include semi-supervised multi-label data stream classification. QIU Shiyuan, Master. Her research inte-rests include semi-supervised multi-label data stream classification. HU Xuegang, Ph.D., professor. His research interests include data mining and know-ledge engineering. |
|
|
|
[1] UD DIN S, SHAO J M, KUMAR J, et al. Online Reliable Semi-supervised Learning on Evolving Data Streams. Information Sciences, 2020, 525: 153-171. [2] WU H X, HAN M, CHEN Z Q, et al. A Weighted Ensemble Cla-ssification Algorithm Based on Nearest Neighbors for Multi-label Data Stream. ACM Transactions on Knowledge Discovery from Data, 2023, 17(5). DOI: 10.1145/357096. [3] HOSSEINI M J, GHOLIPOUR A, BEIGY H.An Ensemble of Cluster-Based Classifiers for Semi-supervised Classification of Non-stationary Data Streams. Knowledge and Information Systems, 2016, 46: 567-597. [4] ADAMS J N, PITSCH C, BROCKHOFF T, et al. An Experimental Evaluation of Process Concept Drift Detection. Proceedings of the VLDB Endowment, 2023, 16(8): 1856-1869. [5] GULCAN E B, ECEVIT I S, CAN F.Binary Transformation Method for Multi-label Stream Classification // Proc of the 31st ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2022: 3968-3972. [6] LI J N, ZHANG Y C, CHEN S W, et al. Enhancing Multi-label Classification via Dynamic Label-Order Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(17): 18527-18535. [7] SUN Y G, SHAO H, WANG S S.Efficient Ensemble Classification for Multi-label Data Streams with Concept Drift. Information, 2019, 10(5). DOI: 10.3390/info10050158. [8] TANHA J, SAMADI N, ABDI Y, et al. CPSSDS: Conformal Prediction for Semi-supervised Classification on Data Streams. Information Sciences, 2022, 584: 212-234. [9] WANG Y, LI T.Improving Semi-supervised Co-forest Algorithm in Evolving Data Streams. Applied Intelligence, 2018, 48: 3248-3262. [10] CAI J H, HAO J, YANG H F, et al. A Review on Semi-supervised Clustering. Information Sciences, 2023, 632:164-200. [11] HUANG G B, LIANG N Y, RONG H J, et al. On-Line Sequential Extreme Learning Machine[C/OL].[2024-05-17]. https://www.researchgate.net/publication/220939946. [12] HUANG G, SONG S J, GUPTA J N D, et al. Semi-supervised and Unsupervised Extreme Learning Machines. IEEE Transactions on Cybernetics, 2014, 44(12): 2405-2417. [13] LIANG N Y, HUANG G B, SARATCHANDRAN P, et al. A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks. IEEE Transactions on Neural Networks, 2006, 17(6): 1411-1423. [14] XU S L, WANG J H.Dynamic Extreme Learning Machine for Data Stream Classification. Neurocomputing, 2017, 238: 433-449. [15] PARK J M, KIM J H.Online Recurrent Extreme Learning Machine and Its Application to Time-Series Prediction // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2017: 1983-1990. [16] ZHOU Z Y, CHEN J, ZHU Z F.Regularization Incremental Extreme Learning Machine with Random Reduced Kernel for Regre-ssion. Neurocomputing, 2018, 321: 72-81. [17] WANG X Y, HAN M.Online Sequential Extreme Learning Machine with Kernels for Nonstationary Time Series Prediction. Neurocomputing, 2014, 145: 90-97. [18] YANG L X, YANG S Y, LI S J, et al. Incremental Laplacian Regularization Extreme Learning Machine for Online Learning. Applied Soft Computing, 2017, 59: 546-555. [19] TSOUMAKAS G, KATAKIS I, VLAHAVAS I.Mining Multi-label Data // MAIMON O, ROKACH L, eds. Data Mining and Know-ledge Discovery Handbook. Berlin, Germany: Springer, 2010: 667-685. [20] ZHANG M L, ZHOU Z H.A Review on Multi-label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. [21] KONG X N, NG M K, ZHOU Z H.Transductive Multilabel Lear-ning via Label Set Propagation. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 704-719. [22] WANG B, TU Z W, TSOTSOS J K.Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 425-432. [23] CHEN G, SONG Y Q, WANG F, et al. Semi-supervised Multi-label Learning by Solving a Sylvester Equation // Proc of the SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2008: 410-419. [24] XING Y Y, YU G X, DOMENICONI C, et al. Multi-label Co-training // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 2882-2888. [25] HUANG G B, CHEN L, SIEW C K.Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hi-dden Nodes. IEEE Transactions on Neural Networks, 2006, 17(4): 879-892. [26] LI P Y, WANG H L, BÖHM C, et al. Online Semi-supervised Multi-label Classification with Label Compression and Local Smooth Regression // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 1359-1365. [27] QIU S Y, LI P P, HU X G.Semi-supervised Online Kernel Extreme Learning Machine for Multi-label Data Stream Classification // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2022. DOI:10.1109/IJCNN55064.2022.9892701. |
|
|
|