Rough Fuzzy K-means Clustering Algorithm Based on Mixed Metrics and Cluster Adaptive Adjustment
ZHANG Xintao1, MA Fumin1, CAO Jie1, ZHANG Tengfei2
1.College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023; 2.College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210003
Abstract:Rough K-means clustering and its related derivative algorithms need the number of clusters in advance, and random selection of the initial cluster center results in low accuracy of data partition in the cross-region of clusters. To solve these problems, a rough fuzzy K-means clustering algorithm with adaptive adjustment of clusters is proposed. When the membership degree of the data objects belonging to different clusters in the intersection area of the cluster boundary is calculated, the mixed metrics of local density and distance are taken into account in the proposed algorithm. The optimal number of clusters is gained by adjusting the number of clusters adaptively. The midpoint of two samples with the smallest distance in the dense area of data objects is selected as the initial cluster center. The object with the local density higher than the average density is divided into the cluster, and then the re-maining initial cluster center can be selected. Thus, the selection of the initial cluster centers is more reasonable. The experiments on synthetic datasets and UCI datasets demonstrate the advantages of the proposed algorithm in adaptability and clustering accuracy for dealing with spherical clusters with blurred boundaries.
[1] JAIN A K, MURTY M N, FLYNN P J. Data Clustering: A Review. ACM Computing Surveys, 1999, 31(3): 264-323. [2] HAN J W, KAMBER M, PEI J. Data Mining: Concepts and Techniques. 3rd Edition. San Francisco, USA: Morgan Kaufmann Publishers, 2011. [3] ZHANG W, YOSHIDA T, TANG X J, et al. Text Clustering Using Frequent Item Sets. Knowledge-Based Systems, 2010, 23(5): 379-388. [4] 吴 烨,钟志农,熊 伟,等.一种高效的属性图聚类方法.计算机学报, 2013, 36(8): 1704-1713. (WU Y, ZHONG Z N, XIONG W, et al. An Efficient Method for Attributed Graph Clustering. Chinese Journal of Computers, 2013, 36(8): 1704-1713.) [5] CHUANG K S, TZENG H L, CHEN S, et al. Fuzzy C-means Clustering with Spatial Information for Image Segmentation. Compute-rized Medical Imaging and Graphics, 2006, 30(1): 9-15. [6] DUNN J C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cyberne-tics, 1973, 3(3): 32-57. [7] LINGRAS P, WEST C. Interval Set Clustering of Web Users with Rough K-means. Journal of Intelligent Information Systems(Integrating Artificial, Intelligence and Database Technologies), 2004, 23(1): 5-16. [8] MITRA S, BANKA H, PEDRYCZ W. Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man, and Cybernetics, 2006, 36(4): 795-805. [9] 王 岩,彭 涛,韩佳育,等.一种基于密度的分布式聚类方法.软件学报, 2017, 28(11): 2836-2850. (WANG Y, PENG T, HAN J Y, et al. Density-Based Distributed Clustering Method. Journal of Software, 2017, 28(11): 2836-2850.) [10] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [11] 马福民,逯瑞强,张腾飞.基于局部密度自适应度量的粗糙K-means聚类算法.计算机工程与科学, 2018, 40(1): 184-190. (MA F M, LU R Q, ZHANG T F. Rough K-means Clustering Based on Local Density Adaptive Measure. Computer Engineering and Science, 2018, 40(1): 184-190.) [12] WU S C, PANG Y J, SHAO S, et al. Advanced Fuzzy C-means Algorithm Based on Local Density and Distance. Journal of Shanghai Jiaotong University(Science), 2018, 23(5): 636-642. [13] FRIGUI H, KRISHNAPURAM R. A Robust Competitive Clus-tering Algorithm with Application in Computer Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450-465. [14] FAZENDEIRO P, DE OLIVEIRA J V. Observer-Biased Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 2015, 23(1): 85-97. [15] DEMPSTER A P, LAIRD N M, RULIN D B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society.Series B(Methodological), 1977, 39(1): 1-38. [16] YANG M S, LAI C Y, LIN C Y. A Robust EM Clustering Algorithm for Gaussian Mixture Models. Pattern Recognition, 2012, 45(11): 3950-3961. [17] YANG M S, NATALIANI Y. Robust-Learning Fuzzy C-means Clustering Algorithm with Unknown Number of Clusters. Pattern Recognition, 2017, 71: 45-59. [18] ZHANG Y P, CHUNG F L, WANG S T. Fast Exemplar-Based Clustering by Gravity Enrichment between Data Objects. IEEE Transactions on Systems, Man, and Cybernetics(Systems), 2018. DOI: 10.1109/TSMC.2018.2833139. [19] FAHAD A, ALSHATRI N, TARI Z, et al. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279.