WANG Shu-Yun1,2, ZHANG Cheng-Hong3, HAO Xiu-Lan1, HU Yun-Fa1
1. Department of Computing Information and Technology, Fudan University, Shanghai 200433 2.School of Public Administration, Fuzhou University, Fuzhou 350002 3.Department of Information Management and Information System, Fudan University,Shanghai 200433
Abstract The learning based on immune principle adapts well to the dynamic environment, and thus it can be applied to the data stream processing which is dynamic and requires high-speed processing. Therefore, an algorithm of clustering data streams based on immune principle is proposed, namely AIN-STREAM. The proposed algorithm can track the evolving clusters on noisy data sets. AIN-STREAM is capable of adjusting the recognition zone of B-cells automatically according to the requirement of users by creating and maintaining the B-Cell feature vectors. Thus, the stability of the clustering result is ensured. Theoretical analysis and comprehensive experimental results demonstrate that AIN-STREAM is superior over other immune principle based clustering algorithms under the circumstance of similar clustering results. Moreover, the results show that AIN-STREAM has a high clustering quality.
[1] Han J W, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann, 2001 [2] Barbara D. Requirements for Clustering Data Streams. ACM SIGKDD Explorations Newsletter, 2002, 3(2): 23-27 [3] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Projected Clustering of High Dimensional Data Streams // Proc of the 30th International Conference on Very Large Data Bases. Toronto, Canada, 2004: 852-863 [4] Aggarwal C C. On Change Diagnosis in Evolving Data Streams. IEEE Trans on Knowledge and Data Engineering, 2005, 17(5): 587-600 [5] Timmis J, Neal M, Hunt J. An Artificial Immune System for Data Analysis. Biosystems, 2000, 55(1): 143-150 [6] de Castro L N, Timmis J. Artificial Immune Systems: A New Computational Intelligence Approach. New York, USA: Springer, 2002 [7] de Castro L N, von Zuben F J. An Evolutionary Immune Network for Data Clustering // Proc of the VI Brazilian Symposium on Neural Networks. Rio de Janeiro, Brazil, 2000: 84-89 [8] de Castro L N, von Zuben F J. aiNet: An Artificial Immune Network for Data Analysis [EB/OL]. [2001- 03- 01]. ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/lnunes/DMHA.pdf [9] Timmis J, Neal M. A Resource Limited Artificial Immune System for Data Analysis. Knowledge Based Systems, 2001, 14(3/4): 121-130 [10] Xu Lifang, Mo Hongwei, Wang Kejun, et al. Document Clustering Based on Modified Artificial Immune Network // Proc of the 1st International Conference on Rough Sets and Knowledge Technology. Chongqing, China, 2006: 516-521 [11] Hang Xiaoshu, Dai Honghua. An Immune Network Approach for Web Document Clustering // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Beijing, China, 2004: 278-284 [12] Ciesielski K, Wierzchoń S T, Klopotek M A. An Immune Network for Contextual Text Data Clustering // Proc of the 31st International Symposium on Mathematical Foundations of Computer Science. Star Lesn, Slovakia, 2006: 432-445 [13] Nasraoui O, Cardona C, Rojas C. Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm // Proc of the Workshop on Webmining as a Premise to Effective and Intelligent Web Applications. Washington, USA, 2003: 71-81 [14] Bezerra G B, de Castro L N. Bioinformatics Data Analysis Using an Artificial Immune Network // Proc of the International Conference on Artificial Immune Systems. Edinburgh, UK, 2003: 22-33 [15] Sotiropoulos D N, Tsihrintzis G A, Savvopoulos A, et al. Artificial Immune System-Based Customer Data Clustering in an e-Shopping Application // Proc of the 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems. Bournemouth, UK, 2006: 960-967 [16] Liu Xiaobing, Zhang Nan. Incremental Immune-Inspired Clustering Approach to Behavior-Based Anti-Spam Technology. International Journal of Information Technology, 2006, 12(3): 111-120 [17] Guha S, Mishra N, Motwani R, et al. Clustering Data Stream // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366 [18] Guha S, Meyerson A, Mishra N, et al. Clustering Data Streams: Theory and Practice. IEEE Trans on Knowledge and Data Engineering, 2003, 15(3): 515-528 [19] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-92 [20] Chang Jianlong, Cao Feng, Zhou Aoying. Clustering Evolving Data Streams over Sliding Windows. Journal of Software, 2007, 18(4): 905-918 (in Chinese) (常建龙,曹 锋,周傲英.基于滑动窗口的进化数据流聚类,软件学报, 2007, 18(4): 905-918) [21]Nasraoui O, Cardona C, Rojas C,et al. TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model // Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003: 235-242 [22] Zhang Tian, Ramakrishan R, Livny M. BIRCH: Efficient Data Clustering Method for Large Databases // Proc of the ACM SIGMOD International Conference on Data Management. Tucson, USA, 1997: 103-114