Abstract:Most of the existing concept drift detection methods focus on single label data stream, and therefore they cannot meet the requirements of multi-label data stream concept drift detection. A concept drift detection algorithm of multi-label data stream based on hierarchical verification is proposed. The proposed algorithm consists of a test layer and a verification layer. The variation of data distribution is detected by the test layer to judge whether the concept drift occurs or not. The variation degree of tag confusion matrix is judged by the verification layer to verify whether the concept drift occurs indeed or not. Experiments on real multi-label and synthetic multi-label datasets indicate that the proposed algorithm detects the concept drift effectively and improves the classification performance.
[1] SUN J, FUJITA H, CHEN P, et al. Dynamic Financial Distress Prediction with Concept Drift Based on Time Weighting Combined with Adaboost Support Vector Machine Ensemble. Knowledge-Based Systems, 2017, 120: 4-14. [2] DAL POZZOLO A, BORACCHI G, CAELEN O, et al. Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2015. DOI: 10.1109/IJCNN.2015.7280527. [3] SHEU J J, CHU K T, LI N F, et al. An Efficient Incremental Learning Mechanism for Tracking Concept Drift in Spam Filtering. PloS One, 2017, 12(2): 1-17. [4] MINKU L L, YAO X. DDD: A New Ensemble Approach for Dea-ling with Concept Drift. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(4): 619-633. [5] KOLTER J Z, MALOOF M A. Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts. Journal of Machine Learning Research, 2007, 8: 2755-2790. [6] GAMA J, MEDAS P, CASTILLO G, et al. Learning with Drift Detection // Proc of the Brazilian Symposium on Artificial Intelligence. Heidelberg, Germany: Springer, 2004: 286-295. [7] DE LIMA CABRAL D R, DE BARROS R S M. Concept Drift Detection Based on Fisher′s Exact Test. Information Sciences, 2018, 442/443: 220-234. [8] BAENA-GARCIA M, DEL CAMPO-VILA J, FIDALGO-MERINO R, et al. Early Drift Detection Method // Proc of the 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams. Berlin, Germany: Springer, 2006: 77-86. [9] BIFET A, GAVALD R. Learning from Time-Changing Data with Adaptive Windowing // Proc of the 7th SIAM International Confe-rence on Data Mining. Minneapolis, USA: SIAM, 2007: 443-448. [10] ZHANG M L, ZHOU Z H. A Review on Multi-label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. [11] BOUTELL M R, LUO J B, SHEN X P, et al. Learning Multi-label Scene Classification. Pattern Recognition, 2004, 37(9): 1757-1771. [12] READ J, PFAHRINGER B, HOLMES G, et al. Classifier Chains for Multi-label Classification. Machine Learning, 2011, 85(3): 333-359. [13] ZHANG M L, ZHOU Z H. ML-KNN: A Lazy Learning Approach to Multi-label Learning. Pattern Recognition, 2007, 40(7): 2038-2048. [14] ELISSEEFF A, WESTON J. A Kernel Method for Multi-labelled Classification // DIETTERICH T G, BECKER S, GHAHRAMANI Z, eds. Advances in Neural Information Processing Systems 14. Cambridge, USA: The MIT Press, 2002: 681-687. [15] KONG X N, YU P S. An Ensemble-Based Approach to Fast Classification of Multi-label Data Streams // Proc of the 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing. Washington, USA: IEEE, 2011: 95-104. [16] READ J, BIFET A, HOLMES G, et al. Scalable and Efficient Multi-label Classification for Evolving Data Streams. Machine Learning, 2012, 88(1/2): 243-272. [17] DOMINGOS P, HULTEN G. Mining High-Speed Data Streams // Proc of the 6th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining. New York, USA: ACM, 2000: 71-80. [18] BIFET A, HOLMES G, PFAHRINGER B, et al. New Ensemble Methods for Evolving Data Streams // Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2009: 139-148. [19] SHI Z W, WEN Y M, FENG C, et al. Drift Detection for Multi-label Data Streams Based on Label Grouping and Entropy // Proc of the IEEE International Conference on Data Mining Workshop. Washington, USA: IEEE, 2014: 724-731. [20] OSOJNIK A, PANOV P, DZˇEROSKI S. Multi-label Classification via Multi-target Regression on Data Streams // Proc of the International Conference on Discovery Science. Berlin, Germany: Sprin-ger, 2015: 170-185. [21] ROSEBERRY M, CANO A. Multi-label KNN Classifier with Self Adjusting Memory for Drifting Data Streams. Proceedings of Machine Learning Research, 2018, 94: 23-27. [22] WANG L L, SHEN H, TIAN H. Weighted Ensemble Classification of Multi-label Data Streams // Proc of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin, Germany:Springer, 2017: 551-562. [23] BORCHANI H, LARRAAGA P, GAMA J, et al. Mining Multi-dimensional Concept-Drifting Data Streams Using Bayesian Network Classifiers. Intelligent Data Analysis, 2016, 20(2): 257-280. [24] BIFET A, HOLMES G, KIRKBY R, et al. MOA: Massive Online Analysis. Journal of Machine Learning Research, 2010, 11: 1601-1604. [25] READ J, REUTEMANN P, PFAHRINGER B, et al. Meka: A Multi-label/Multi-target Extension to Weka. The Journal of Machine Learning Research, 2016, 17(21): 1-5. [26] READ J, PFAHRINGER B, HOLMES G. Multi-label Classification Using Ensembles of Pruned Sets // Proc of the 8th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2008: 995-1000. [27] SZYMAN'SKI P, KAJDANOWICZ T. Scikit-Multilearn: A Scikit-based Python Environment for Performing Multi-label Classification. Proceedings of Machine Learning Research, 2016, 1: 1-15. [28] DEMSˇAR J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 2006, 7: 1-30.