Abstract:An algorithm based on immune principle, named IUMicro, is proposed to cluster uncertain data streams. IUMicro applies a dynamically updated immune model to adapt to the data streams. An effective B-cell feature vector and updating strategy are used to collect statistical information of data streams on line by this model. To choose the optimal candidate cluster for each increasing tuple in the data stream, IUMicro defines a probability radius of a B-cell’s recognition zone to address both uncertainty and distance metric. The offline clustering is an arbitrary-shape unsupervised clustering based on immune B-cells’ spatial relationship between regions. The experimental results show that IUMicro effectively suppresses noise and gains better clustering quality at a high processing speed.
[1] Cao Feng,Ester M,Qian Weining,et al.Density-Based Clustering over an Evolving Data Stream with Noise // Proc of the 6th SIAM Conference on Data Mining.Bechesda,USA,2006: 328-339 [2] Nasraoui O,Cardona C,Rojas C,et al.TECNO-STREAMS:Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model // Proc of the 3rd IEEE International Conference on Data Mining.Melbourne,USA,2003: 235-242 [3] Wang Shuyun,Zhang Chenghong,Hao Xiulan,et al.Data Stream Clustering Based on Immune Principle.Pattern Recognition and Artificial Intelligence,2009,22(2): 246-255 (in Chinese) (王述云,张成洪,郝秀兰,等.基于免疫原理的数据流聚类算法.模式识别与人工智能,2009,22(2): 246-255) [4] Aggarwal C,Han Jiawei,Wang Jianyong,et al.A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases.Berlin,Germany,2003: 81-92 [5] Aggarwal C,Yu P S.A Survey of Uncertain Data Algorithms and Applications.IEEE Trans on Knowledge and Data Engineering,2009,21(5): 609-623 [6] Aggarwal C,Yu P S.A Framework for Clustering Uncertain Data Streams // Proc of the 24th IEEE International Conference on Data Engineering.Cancún,Mexico,2008: 150-159 [7] Dai Dongbo,Zhao Gang,Shun Shengli.Effective Clustering Algorithm for Probabilistic Data Stream.Journal of Software,2009,20(5): 1313-1328 (in Chinese) (戴东波,赵 杠,孙圣力.基于概率数据流的有效聚类算法.软件学报,2009,20(5): 1313-1328) [8] Zhang Chen,Jin Cheqing,Zhou Aoying.Clustering Algorithm over Uncertain Data Stream.Journal of Software,2010,21(9): 2173-2182 (in Chinese) (张 晨,金澈清,周傲英.一种不确定数据流聚类算法.软件学报,2010,21(9): 2173-2182) [9] Timmis J,Neal M,Hunt J.An Artificial Immune System for Data Analysis.Biosystems,2000,55(1/2/3): 143-150 [10] de Castro L N,von Zuben F J.An Evolutionary Immune Network for Data Clustering // Proc of the 6th Brazilian Symposium on Neural Networks.Rio de Janiero,Brazil,2000: 84-89 [11] de Castro L N,von Zuben F J.aiNet: An Artificial Immune Network for Data Analysis[EB/OL].[2001-03-01].ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/lnunes/DMHA.pdf [12] Timmis J,Neal M.A Resource Limited Artificial Immune System for Data Analysis.Knowledge Based Systems,2001,14(3/4): 121-130 [13] Watanabe O.Simple Sampling Techniques for Discovery Science.IEICE Trans on Information and Systems,1999,83(1): 19-26