Abstract:As a hot research orientation of data stream mining, multiple data stream clustering tracks the evolution of multiple streams and partitions them according to their similarities. In this paper, a multiple data stream clustering approach is proposed, which is based on the combination of grey relational analysis and affinity propagation clustering. A grey relational degree is developed so that the raw data can be compressed into an incrementally updatable grey relational synopsis. The similarity between two data streams is measured by the grey relational degree calculated from the synopsis. Finally, the affinity propagation algorithm is used to cluster the streams. The experiments on the real data sets prove the effectiveness of the new method.
[1] Abdulsalam H, Skillicorn D B, Martin P. Classification Using Streaming Random Forests. IEEE Trans on Knowledge and Data Engineering, 2011, 23(1): 22-36 [2] Masud M M, Chen Qing, Khan L, et al. Addressing Concept Evolution in Concept-Drifting Data Streams // Proc of the 10th IEEE International Conference on Data Mining. Sydney, Australia, 2010: 929-934 [3] Woo H J, Lee W S. EstMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams. IEEE Trans on Knowledge and Data, 2008, 21(10): 1418-1431 [4] Aggarwal C C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-102 [5] Masud M M, Gao J, Khan L, et al. A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data // Proc of the 8th IEEE International Conference on Data Mining. Pisa, Italy, 2008: 929-934 [6] Kranen P, Assent I, Baldauf C, et al. Self-Adaptive Anytime Stream Clustering // Proc of the 9th IEEE International Conference on Data Mining. Miami, USA, 2009: 249-258 [7] Cao Feng, Ester M, Qian Weining, et al. Density-Based Clustering over an Evolving Data Stream with Noise // Proc of the SIAM Conference on Data Mining. Bethesda, USA, 2006: 328-339 [8] Chen Yixin, Tu Li. Density-Based Clustering for Real-Time Stream Data // Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 133-142 [9] Heinz C, Seeger B. Cluster Kernels: Resource-Aware Kernel Density Estimators over Streaming Data. IEEE Trans on Knowledge and Data Engineering, 2008, 20(7): 880-893 [10] Rodrigues P P, Gama J, Pedroso J P. Hierarchical Clustering of Time-Series Data Streams. IEEE Trans on Knowledge and Data Engineering, 2008, 20(5): 615-627 [11] Ho S S, Wechsler H. A Martingale Framework for Detecting Changes in Data Streams by Testing Exchangeability. IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32(12): 2113-2127 [12] Papadimitriou S, Sun Jimeng, Faloutsos C. Streaming Pattern Discovery in Multiple Time-Series // Proc of the 31st International Conference on Very Large Data Bases. Trondheim, Norway, 2005: 697-708 [13] Sakurai Y, Papadimitriou S, Faloutsos C. BRAID: Stream Mining through Group Lag Correlations // Proc of the ACM SIGMOD International Conference on Management of Data. Baltimore, USA, 2005: 599-610 [14] Yang J. Dynamic Clustering of Evolving Streams with a Single Pass // Proc of the 19th International Conference on Data Engineering. Bangalore, India, 2003: 695-697 [15] Beringer J, Hullermeier E. Online Clustering of Parallel Data Streams. Data Mining and Knowledge Discovery, 2006, 58(2): 180-204 [16] Dai Biru, Huang J W, Yeh M Y, et al. Adaptive Clustering for Multiple Evolving Streams. IEEE Trans on Knowledge and Data Engineering, 2006, 18(9): 1166-1180 [17] Deng Julong. Elements on Grey Theory. Wuhan, China: Huazhong University of Science and Technology Press, 2002 (in Chinese) (邓聚龙.灰理论基础.武汉:华中科技大学出版社, 2002) [18] Zhang Qishan. Difference Information Theory of Grey Hazy Set. Beijing, China: Petroleum Industry Press, 2002 (in Chinese) (张岐山.灰朦胧集的差异信息理论.北京:石油工业出版社, 2002) [19] Wang Qingyin, Zhao Xiuheng. The Relational Analysis of C-Mode. Journal of Huazhong University of Science and Technology, 1999, 27(3): 75-77 (in Chinese) (王清印,赵秀恒.C型关联分析.华中理工大学学报, 1999, 27(3): 75-77 ) [20] Tang Wuxiang. The Concept and the Computation Method of Ts Correlation Degree. Application of Statistics and Management, 1995, 14(1): 34-37 (in Chinese) (唐五湘.T型关联度及其计算方法.数理统计与管理, 1995, 14(1): 34-37 ) [21] Sun Yugang, Dang Yaoguo. Improvement on Grey T s Correlation Degree. System Engineering-Theory Practice, 2008, 28(4): 135-139 (in Chinese) (孙玉刚,党耀国.灰色T型关联度的改进.系统工程理论与实践, 2008, 28(4): 135-139) [22] Liu Sifeng, Xie Naiming. The Theory and the Application of Grey System. 4th Edition. Beijing, China: Science Press, 2008 (in Chinese) (刘思峰,谢乃明.灰色系统理论及其应用.第4版.北京:科学出版社, 2008) [23] Wang Zhengxin, Dang Yaoguo, Cao Mingxia. Weighted Degree of Grey Incidence Based on Optimized Entropy. System Engineering and Electronics, 2010, 32(4): 774-776 (in Chinese) (王正新,党耀国,曹明霞.基于灰熵优化的加权灰色关联度.系统工程与电子技术, 2010, 32(4): 774-776) [24] Wang Jingcheng, Zhu Wenzhi, Zhang Yanbin. Improved Algorithm of Grey Incidence Degree Based on Area. System Engineering and Electronics, 2010, 32(4): 777-779 (in Chinese) (王靖程,诸文智,张彦斌.基于面积的改进灰关联度算法.系统工程与电子技术, 2010, 32(4): 777-779) [25] Tu Li, Chen Ling, Zou Lingjun. Clustering Multiple Data Streams Based on Correlation Analysis. Journal of Software, 2009, 20(7): 1756-1767(in Chinese) (屠 莉,陈 崚,邹凌君.基于相关分析的多数据流聚类.软件学报, 2009, 20(7): 1756-1767)