Classification Algorithm Combined with Unsupervised Learning for Data Stream
XU Shuliang1 , WANG Junhong1,2
1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006 2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:An ensemble learning techniques based algorithm combined with unsupervised learning is proposed for concept drift problem of data stream. An attribute reduction mechanism is introduced into classification process and then a clustering algorithm is applied to the data for clustering. Accuracies of classification and clustering are compared to decide whether concept drift appears or not. The experimental results show that the proposed algorithm efficiently decreases time consumption and improves the precision.
[1] WOZ′NIAK M, KASPRZAK A, CAL P. Weighted Aging Classifier Ensemble for the Incremental Drifted Data Streams // Proc of the 10th International Conference on Flexible Query Answering Systems. Berlin, Germany: Springer-Verlag, 2013: 579-588. [2] YANG Y, WU X D, ZHU X Q. Combining Proactive and Reactive Predictions for Data Streams // Proc of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, USA: ACM, 2005: 710-715. [3] SHAO J M, AHMADI Z, KRAMER S. Prototype-Based Learning on Concept-Drifting Data Streams // Proc of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 412-421. [4] ZˇLIOBAIT I, BIFET A, READ J, et al. Evaluation Methods and Decision Theory for Classification of Streaming Data with Temporal Dependence. Machine Learning, 2015, 98(3): 455-482. [5] WANG H X, YIN J, PEI J, et al. Suppressing Model Overfitting in Mining Concept-Drifting Data Streams // Proc of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2006: 736-741. [6] STREET W N, KIM Y S. A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification // Proc of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2001: 377-382. [7] DOMINGOS P, HULTEN G. Mining High-Speed Data Streams // Proc of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2000: 71-80. [8] HULTEN G, SPENCER L, DOMINGOS P. Mining Time-Changing Data Streams // Proc of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA:ACM, 2001: 97-106. [9] WU X D, LI P P, HU X G. Learning from Concept Drifting Data Streams with Unlabeled Data. Neurocomputing, 2012, 92: 145-155. [10] ELWELL R, POLIKAR R. Incremental Learning of Concept Drift in Nonstationary Environments. IEEE Trans on Neural Networks, 2011, 22(10): 1517-1531. [11] LIAO J W, DAI B R. An Ensemble Learning Approach for Concept Drift // Proc of the International Conference on Information Science and Applications. Seoul, Republic of Korea: IEEE, 2014. DOI:10.1109/ICISA2014.6847357. [12] BRZEZINSKI D, STEFANOWSKI J. Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams // Proc of the 3rd International Workshop on New Frontiers in Mining Complex Patterns. Zurich, Switzerland: Springer International Publishing, 2015: 87-101. [13] BRZEZINSKI D, STEFANOWSKI J. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. IEEE Trans on Neural Networks and Learning Systems, 2014, 25(1): 81-94. [14] WANG H X, FAN W, YU P S, et al. Mining Concept-Drifting Data Streams Using Ensemble Classifiers // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003: 226-235. [15] 李 燕,张玉红,胡学钢.基于C4.5和NB混合模型的数据流分类算法.计算机科学, 2010, 37(12): 138-142. (LI Y, ZHANG Y H, HU X G. Classification Algorithm for Data Stream Based on Mixture Models of C4.5 and NB. Computer Science, 2010, 37(12): 138-142.) [16] 孙 岳,毛国君,刘 旭,等.基于多分类器的数据流中的概念漂移挖掘.自动化学报, 2008, 34(1): 93-97. (SUN Y, MAO G J, LIU X, et al. Mining Concept Drifts from Data Streams Based on Multi-classifiers. Acta Automatica Sinica, 2008, 34(1): 93-97.) [17] BIFET A, HOLMES G, KIRIKBY R, et al. MOA: Massive Online Analysis. Journal of Machine Learning Research, 2010, 11: 1601-1604.