Abstract:An online local adaptive fuzzy C-means (OLAFCM) algorithm for high dimensional data is proposed based on fuzzy C-means (FCM) and local adaptive clustering (LAC). Through assigning corresponding weights to its attributes,OLAFCM can make each cluster distribute in a subspace spanned by the combination of different attributes. Thus,the proposed algorithm not only avoids the risk of loss of information encountered in global dimensionality reduction techniques,but also is suitable for clustering data streams. Compared to state-of-the-art partition-based online clustering algorithms using global dimensionality reduction methods,the proposed algorithm has better performance on artificial and real datasets.
[1] Aggarwal C C, Han J, Wang J, et al. A Framework for Clustering Evolving Data Streams // Proc of the 29th International Conference on Very Large Data Bases. Berlin, Germany, 2003: 81-92 [2] Park N H, Lee W S. Statistical Grid-Based Clustering over Data Streams. ACM SIGMOD Record, 2004, 33(1): 32-37 [3] Jia Chen, Tan Chengyu, Yong Ai. A Grid and Density-Based Clustering Algorithm for Processing Data Streams // Proc of the 2nd International Conference on Genetic and Evolutionary Computing. Washington DC, USA,, 2008: 517-521 [4] Tu Li, Chen Yixin. Stream Data Clustering Based on Grid Density and Attraction. ACM Trans on Knowledge Discovery from Data, 2009, 3(3): 1-26 [5] Cao F, Estery M, Qian W, et al. Density-Based Clustering over An Evolving Data Streams with Noise // Proc of the 6th SIAM International Conference on Data Mining. Bethesda, USA: SIAM Press, 2006: 328-339 [6] Hore P, Hall L O, Goldgof D B, et al. Online Fuzzy C Means // Proc of the Annual Meeting of the North American Fuzzy Information Processing Society. New York, USA, 2008: 1-5 [7] Labroche N. New Incremental Fuzzy C Medoids Clustering Algorithms // Proc of the Annual Meeting of the North American Fuzzy Information Processing Society. Toronto, Canada, 2010: 145-150 [8] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD, 1998, 27(2): 94-105 [9] Aggarwal C C, Wolf J L, Yu P S, et al. Fast Algorithms for Projected Clustering. ACM SIGMOD, 1999, 28(2): 61-72 [10] Arfaoui O, Sassi M. Fuzzy Clustering of Large-Scale Data Sets Using Principal Component Analysis // Proc of the IEEE International Conference on Fuzzy Systems. Taipei, China, 2011: 683-690 [11] Turk M, Pentland A. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 1991, 3(1): 71-86 [12] Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York, USA: Springer-Verlag, 1981 [13] Domeniconi C, Gunopulos D, Ma S, et al. Locally Adaptive Metrics for Clustering High Dimensional Data. Data Mining and Knowledge Discovery, 2007, 14(1): 63-97 [14] Frigui H, Krishnapuram R. Clustering by Competitive Agglomeration. Pattern Recognition, 1997, 30(1): 1109-1119 [15] Zhu Lin, Cao Longbing, Yang Jie. Soft Subspace Clustering with Competitive Agglomeration // Proc of the IEEE International Conference on Fuzzy Systems. Taipei, China, 2011: 691-698 [16] UC Irvine. UCI Machine Learning Repository [DB/OL]. [2012-06-15]. http://archive. ics. uci.edu/ml/ [17] Yann L C, Corinna C, Christopher J C B. THE MNIST DATABASE of handwritten digits [EB/OL]. [2012-06-15]. http://yann.lecun.com/exdb/mnist/ [18] Wagstaff K, Cardie C, Rogers S, et al. Constrained K-Means Clustering with Background Knowledge // Proc of the 18th International Conference on Machine Learning. San Francisco, USA: Morgan Kanufman Press, 2001: 577-584 [19] He Xiaofei, Niyogi P. Locality Preserving Projections // Thrun S, Saul L K, Sch lkopf B, eds. Advances in Neural Information Processing Systems 16. Cambridge, USA: MIT Press, 2004: 37-44