k-Nearest-Neighbor Network Based Data Clustering Algorithm
JIN Di1,2,LIU Jie1,2,3,JIA Zheng-Xue4,LIU Da-You1,2
1.College of Computer Science and Technology,Jilin University,Changchun 130012 2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012 3.Shanghai Key Laboratory of Intelligent Information Processing,Fudan University,Shanghai 200433 4.FAW VW Automobile Co.,Ltd.,Changchun 130012
Abstract:Data clustering is a hotspot in data mining area. Though there have been lots of data clustering algorithms now, the clustering accuracy of them is far from perfect. A structural similarity based network clustering algorithm (SSNCA) is proposed in this paper, which attempt to further improve the data clustering accuracy from the view of network clustering. The concrete solution scheme is that vector dataset for clustering is converted to a k-Nearest-Neigborhood network and SSNCA is used to cluster this network. Comparing SSNCA with the algorithms of c-Means and affinity propagation (AP), experimental result shows that the fitness value got by the proposed algorithm is a little worse than AP, but its clustering accuracy is obviously better than that of the other two algorithms.
[1] Sun Jigui, Liu Jie, Zhao Lianyu. Clustering Algorithms Research. Journal of Software, 2009, 19(1): 48-61 (in Chinese) (孙吉贵,刘 杰,赵连宇.聚类算法研究.软件学报, 2009, 19(1): 48-61) [2] Zhang Tian, Ramakrishnan R, Livny M. Birch: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182 [3] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations // Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967, Ⅰ: 281-297 [4] Ester M, Riegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, USA, 1996: 291-316 [5] Ding C, He Xiaofei. k-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization // Proc of the ACM Symposium on Applied Computing. Nicosia, Cyprus, 2004: 584-589 [6] Frnti P, Virmajoki O, Hautamki V. Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph. IEEE Trans on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1875-1881 [7] Zhu Qiaoming, Li Junhui, Zhou Guodong, et al. A Novel Hierarchical Document Clustering Algorithm Based on a kNN Connection Graph // Proc of the 21st International Conference on the Computer Processing of Oriental Languages. Singapore, Singapore, 2006: 120-130 [8] Teng S H, Yao F F. k-Nearest-Neighbor Clustering and Percolation Theory. Algorithmica, 2007, 49(3): 192-211 [9] Liu Dayou, Liu Jie, Jin Di. Study on k-Nearest-Neighbor Partition Based Clustering Algorithm // Proc of the National Conference on Artificial Intelligence. Harbin, China, 2007: 169-174 (in Chinese) (刘大有,刘 杰,金 弟.基于k最近邻划分的聚类算法研究//中国人工智能进展2007. 哈尔滨, 2007: 169-174) [10] Luxburg U V. A Tutorial on Spectral Clustering. Statistics and Computing, 2007, 17(4): 395-416 [11] Newman M E, Girvan M. Finding and Evaluating Community Structure in Networks [EB/OL]. [2004-02-26]. http://www.ncibi.org/gateway/pdf/newman%20community%20struct%20networks %20phys%20rev.pdf [12] Xu Xiaowei, Yuruk N, Feng Zhidan, et al. SCAN: A Structural Clustering Algorithm for Networks // Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 824-833 [13] Girvan M, Newman M E. Community Structure in Social and Biological Networks. Proc of National Academy of Science of USA, 2002, 99(12): 7821-7826 [14] Newman M E. Fast Algorithm for Detecting Community Structure in Networks [EB/OL]. [2004-02-26]. http://www-stat.stanford.edu/~owen/courses/315c/readings/PhysRevE_69_066133.pdf [15] Palla G, Derenyi I, Farkas I, et al. Uncovering the Overlapping Community Structures of Complex Networks in Nature and Society. Nature, 2005, 435(7043): 814-818 [16] Yang Bo, Cheung W K, Liu Jiming. Community Mining from Signed Social Networks. IEEE Trans on Knowledge and Data Engineering, 2007, 19(10): 1333-1348 [17] Frey B J, Dueck D. Clustering by Passing Messages between Data Points. Science, 2007, 315(5814): 972-976 [18] Vlasblom J, Wodak S J. Markov Clustering Versus Affinity Propagation for the Partitioning of Protein Interaction Graphs. BMC Bioinformatics, 2009, 10: 99