|
|
Data Stream Clustering Based on Immune Principle |
WANG Shu-Yun1,2, ZHANG Cheng-Hong3, HAO Xiu-Lan1, HU Yun-Fa1 |
1. Department of Computing Information and Technology, Fudan University, Shanghai 200433 2.School of Public Administration, Fuzhou University, Fuzhou 350002 3.Department of Information Management and Information System, Fudan University,Shanghai 200433 |
|
|
Abstract The learning based on immune principle adapts well to the dynamic environment, and thus it can be applied to the data stream processing which is dynamic and requires high-speed processing. Therefore, an algorithm of clustering data streams based on immune principle is proposed, namely AIN-STREAM. The proposed algorithm can track the evolving clusters on noisy data sets. AIN-STREAM is capable of adjusting the recognition zone of B-cells automatically according to the requirement of users by creating and maintaining the B-Cell feature vectors. Thus, the stability of the clustering result is ensured. Theoretical analysis and comprehensive experimental results demonstrate that AIN-STREAM is superior over other immune principle based clustering algorithms under the circumstance of similar clustering results. Moreover, the results show that AIN-STREAM has a high clustering quality.
|
Received: 21 December 2007
|
|
|
|
|
[1] Han J W, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann, 2001 [2] Barbara D. Requirements for Clustering Data Streams. ACM SIGKDD Explorations Newsletter, 2002, 3(2): 23-27 [3] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Projected Clustering of High Dimensional Data Streams // Proc of the 30th International Conference on Very Large Data Bases. Toronto, Canada, 2004: 852-863 [4] Aggarwal C C. On Change Diagnosis in Evolving Data Streams. IEEE Trans on Knowledge and Data Engineering, 2005, 17(5): 587-600 [5] Timmis J, Neal M, Hunt J. An Artificial Immune System for Data Analysis. Biosystems, 2000, 55(1): 143-150 [6] de Castro L N, Timmis J. Artificial Immune Systems: A New Computational Intelligence Approach. New York, USA: Springer, 2002 [7] de Castro L N, von Zuben F J. An Evolutionary Immune Network for Data Clustering // Proc of the VI Brazilian Symposium on Neural Networks. Rio de Janeiro, Brazil, 2000: 84-89 [8] de Castro L N, von Zuben F J. aiNet: An Artificial Immune Network for Data Analysis [EB/OL]. [2001- 03- 01]. ftp://ftp.dca.fee.unicamp.br/pub/docs/vonzuben/lnunes/DMHA.pdf [9] Timmis J, Neal M. A Resource Limited Artificial Immune System for Data Analysis. Knowledge Based Systems, 2001, 14(3/4): 121-130 [10] Xu Lifang, Mo Hongwei, Wang Kejun, et al. Document Clustering Based on Modified Artificial Immune Network // Proc of the 1st International Conference on Rough Sets and Knowledge Technology. Chongqing, China, 2006: 516-521 [11] Hang Xiaoshu, Dai Honghua. An Immune Network Approach for Web Document Clustering // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Beijing, China, 2004: 278-284 [12] Ciesielski K, Wierzchoń S T, Klopotek M A. An Immune Network for Contextual Text Data Clustering // Proc of the 31st International Symposium on Mathematical Foundations of Computer Science. Star Lesn, Slovakia, 2006: 432-445 [13] Nasraoui O, Cardona C, Rojas C. Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm // Proc of the Workshop on Webmining as a Premise to Effective and Intelligent Web Applications. Washington, USA, 2003: 71-81 [14] Bezerra G B, de Castro L N. Bioinformatics Data Analysis Using an Artificial Immune Network // Proc of the International Conference on Artificial Immune Systems. Edinburgh, UK, 2003: 22-33 [15] Sotiropoulos D N, Tsihrintzis G A, Savvopoulos A, et al. Artificial Immune System-Based Customer Data Clustering in an e-Shopping Application // Proc of the 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems. Bournemouth, UK, 2006: 960-967 [16] Liu Xiaobing, Zhang Nan. Incremental Immune-Inspired Clustering Approach to Behavior-Based Anti-Spam Technology. International Journal of Information Technology, 2006, 12(3): 111-120 [17] Guha S, Mishra N, Motwani R, et al. Clustering Data Stream // Proc of the 41st Annual Symposium on Foundations of Computer Science. Redondo Beach, USA, 2000: 359-366 [18] Guha S, Meyerson A, Mishra N, et al. Clustering Data Streams: Theory and Practice. IEEE Trans on Knowledge and Data Engineering, 2003, 15(3): 515-528 [19] Aggarwal C C, Han J W, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-92 [20] Chang Jianlong, Cao Feng, Zhou Aoying. Clustering Evolving Data Streams over Sliding Windows. Journal of Software, 2007, 18(4): 905-918 (in Chinese) (常建龙,曹 锋,周傲英.基于滑动窗口的进化数据流聚类,软件学报, 2007, 18(4): 905-918) [21]Nasraoui O, Cardona C, Rojas C,et al. TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model // Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003: 235-242 [22] Zhang Tian, Ramakrishan R, Livny M. BIRCH: Efficient Data Clustering Method for Large Databases // Proc of the ACM SIGMOD International Conference on Data Management. Tucson, USA, 1997: 103-114 |
|
|
|