WANG Shu-Yun1,2, ZHANG Cheng-Hong3, HAO Xiu-Lan1, HU Yun-Fa1
1. Department of Computing Information and Technology, Fudan University, Shanghai 200433 2.School of Public Administration, Fuzhou University, Fuzhou 350002 3.Department of Information Management and Information System, Fudan University,Shanghai 200433
摘要 由于基于免疫的学习方法能够较好地适应数据流不断变化及高速处理的要求,本文据此提出一种基于免疫原理的数据流聚类算法(AIN-STREAM).该算法能够动态适应数据流的变化,并能有效抑制噪声.AIN-STREAM 通过建立与维护 B 细胞特征向量,从而能够根据用户的要求自动调整 B 细胞的识别区域,保证聚类结果的稳定性.理论分析和实验结果表明,在聚类结果相当的条件下,AIN-STREAM具有比同类算法更高的时间与空间效率,同时具有较高的聚类精度.
Abstract:The learning based on immune principle adapts well to the dynamic environment, and thus it can be applied to the data stream processing which is dynamic and requires high-speed processing. Therefore, an algorithm of clustering data streams based on immune principle is proposed, namely AIN-STREAM. The proposed algorithm can track the evolving clusters on noisy data sets. AIN-STREAM is capable of adjusting the recognition zone of B-cells automatically according to the requirement of users by creating and maintaining the B-Cell feature vectors. Thus, the stability of the clustering result is ensured. Theoretical analysis and comprehensive experimental results demonstrate that AIN-STREAM is superior over other immune principle based clustering algorithms under the circumstance of similar clustering results. Moreover, the results show that AIN-STREAM has a high clustering quality.
王述云,张成洪,郝秀兰,胡运发. 基于免疫原理的数据流聚类算法*[J]. 模式识别与人工智能, 2009, 22(2): 246-255.
WANG Shu-Yun, ZHANG Cheng-Hong, HAO Xiu-Lan, HU Yun-Fa. Data Stream Clustering Based on Immune Principle. , 2009, 22(2): 246-255.
[1] Han J W, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann, 2001 [2] Barbara D. Requirements for Clustering Data Streams. ACM SIGKDD Explorations Newsletter, 2002, 3(2): 23-27 [3] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Projected Clustering of High Dimensional Data Streams // Proc of the 30th International Conference on Very Large Data Bases. Toronto, Canada, 2004: 852-863 [4] Aggarwal C C. On Change Diagnosis in Evolving Data Streams. IEEE Trans on Knowledge and Data Engineering, 2005, 17(5): 587-600 [5] Timmis J, Neal M, Hunt J. An Artificial Immune System for Data Analysis. Biosystems, 2000, 55(1): 143-150 [6] de Castro L N, Timmis J. Artificial Immune Systems: A New Computational Intelligence Approach. New York, USA: Springer, 2002 [7] de Castro L N, von Zuben F J. An Evolutionary Immune Network for Data Clustering // Proc of the VI Brazilian Symposium on Neural Networks. Rio de Janeiro, Brazil, 2000: 84-89 [8] de Castro L N, von Zuben F J. aiNet: An Artificial Immune Network for Data Analysis [EB/OL]. [2001- 03- 01]. ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/lnunes/DMHA.pdf [9] Timmis J, Neal M. A Resource Limited Artificial Immune System for Data Analysis. Knowledge Based Systems, 2001, 14(3/4): 121-130 [10] Xu Lifang, Mo Hongwei, Wang Kejun, et al. Document Clustering Based on Modified Artificial Immune Network // Proc of the 1st International Conference on Rough Sets and Knowledge Technology. Chongqing, China, 2006: 516-521 [11] Hang Xiaoshu, Dai Honghua. An Immune Network Approach for Web Document Clustering // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Beijing, China, 2004: 278-284 [12] Ciesielski K, Wierzchoń S T, Klopotek M A. An Immune Network for Contextual Text Data Clustering // Proc of the 31st International Symposium on Mathematical Foundations of Computer Science. Star Lesn, Slovakia, 2006: 432-445 [13] Nasraoui O, Cardona C, Rojas C. Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm // Proc of the Workshop on Webmining as a Premise to Effective and Intelligent Web Applications. Washington, USA, 2003: 71-81 [14] Bezerra G B, de Castro L N. Bioinformatics Data Analysis Using an Artificial Immune Network // Proc of the International Conference on Artificial Immune Systems. Edinburgh, UK, 2003: 22-33 [15] Sotiropoulos D N, Tsihrintzis G A, Savvopoulos A, et al. Artificial Immune System-Based Customer Data Clustering in an e-Shopping Application // Proc of the 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems. Bournemouth, UK, 2006: 960-967 [16] Liu Xiaobing, Zhang Nan. Incremental Immune-Inspired Clustering Approach to Behavior-Based Anti-Spam Technology. International Journal of Information Technology, 2006, 12(3): 111-120 [17] Guha S, Mishra N, Motwani R, et al. Clustering Data Stream // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366 [18] Guha S, Meyerson A, Mishra N, et al. Clustering Data Streams: Theory and Practice. IEEE Trans on Knowledge and Data Engineering, 2003, 15(3): 515-528 [19] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-92 [20] Chang Jianlong, Cao Feng, Zhou Aoying. Clustering Evolving Data Streams over Sliding Windows. Journal of Software, 2007, 18(4): 905-918 (in Chinese) (常建龙,曹 锋,周傲英.基于滑动窗口的进化数据流聚类,软件学报, 2007, 18(4): 905-918) [21]Nasraoui O, Cardona C, Rojas C,et al. TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model // Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003: 235-242 [22] Zhang Tian, Ramakrishan R, Livny M. BIRCH: Efficient Data Clustering Method for Large Databases // Proc of the ACM SIGMOD International Conference on Data Management. Tucson, USA, 1997: 103-114