Abstract:The decision function of density peak clustering algorithm cannot determine the clustering center automatically and effectively. Therefore, a density peak clustering algorithm, automatically clustering by fast search and find of density peaks(AUTO-CFSFDP), is proposed. Firstly, the normalization process is carried out to make the uneven distribution of variables in the decision function become uniform. Secondly, the selection strategy based on positive-sequence iteration is presented to search elbow point according to the variation trend of the number of cluster core points in the process of determining the cluster center. A set of points before the elbow point is used as the cluster centers to complete clustering. Finally, the performance of AUTO-CFSFDP is evaluated on UCI datasets. AUTO-CFSFDP can cluster the datasets of arbitrary distributions without extra time consumption. The adaptability and clustering results are improved effectively.
[1] CHEN C L P, ZHANG C Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 2014, 275: 314-347. [2] GAN W S, LIN J C W, CHAO H C, et al. Data Mining in Distributed Environment: A Survey. Data Mining and Knowledge Discovery, 2017, 7: e1216. [3] XU D K, TIAN Y J.A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, 2015, 2(2): 165-193. [4] JAIN A K, DUBES R C.Algorithms for Clustering Data. New York, USA: Prentice Hall, 1988. [5] 王万良. 人工智能及其应用.第3版.高等教育出版社, 2016. (WANG W L.Artificial Intelligence and Application. 3rd Edition. Beijing, China: Higher Education Press, 2016) [6] ZHANG Y M, LIU M D, LIU Q W.An Energy-Balanced Clustering Protocol Based on an Improved CFSFDP Algorithm for Wireless Sensor Networks. Sensors, 2018, 18(3). DOI: 10.3390/s18030881. [7] ALTMAN N, KRZYWINSKI M.Points of Significance: Clustering. Nature Methods, 2017, 14(6): 545-546. [8] QIN B Y, LI Z, LUO Z H, et al. Terahertz Time-Domain Spectroscopy Combined with PCA-CFSFDP Applied for Pesticide Detection. Optical & Quantum Electronics, 2017, 49(7). DOI: 10.1007/s11082-017-1080-x. [9] 郑建炜,路程,秦梦洁,等.联合特征选择和光滑表示的子空间聚类算法.模式识别与人工智能, 2018, 31(5): 409-418. (ZHENG J W, LU C, QIN M J, et al. Subspace Clustering via Joint Feature Selection and Smooth Representation. Pattern Recognition and Artificial Intelligence, 2018, 31(5): 409-418.) [10] 逯瑞强,马福民,张腾飞.基于区间2-型模糊度量的粗糙K-means聚类算法.模式识别与人工智能, 2018, 31(3): 265-274. (LU R Q, MA F M, ZHANG T F.Interval Type-2 Fuzzy Measure Based Rough K-means Clustering. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 265-274.) [11] 雷小锋,谢昆青,林帆,等.一种基于K-means局部最优性的高效聚类算法.软件学报, 2008, 19(7): 1683-1692. (LEI X F, XIE K Q, LIN F, et al. An Efficient Clustering Algorithm Based on Local Optimality of K-means. Journal of Software, 2008, 19(7): 1683-1692.) [12] ZHANG T, RAMAKRISHNAN R, LIVNY M.BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182. [13] GUHA S, RASTOGI R, SHIM K.CURE: An Efficient Clustering Algorithm for Large Database. Information Systems, 2001, 26(1): 35-58. [14] RODRIGUEZ A, LAIO A.Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [15] ESTER M, KRIEGEL H P, XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the International Conference on Knowledge Discovery and Data Mining. Palo Alto, USA: AAAI Press, 1996: 226-231. [16] XIE J Y, GAO H C, XIE W X, et al. Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-nearest Neighbors. Information Sciences, 2016, 354: 19-40. [17] MEHMOOD R, BIE R, JIAO L B, et al. Adaptive Cutoff Distance: Clustering by Fast Search and Find of Density Peaks. Journal of Intelligent and Fuzzy Systems, 2016, 31(5): 2619-2628. [18] WANG W, YANG J, MUNTZ R R.STING: A Statistical Information Grid Approach to Spatial Data Mining // Proc of the International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publisher, 1997: 186-195. [19] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Record, 1998, 27(2): 94-105. [20] 朱杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能, 2017, 30(5): 439-447. (ZHU J, CHEN L F.Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439-447.) [21] HAN J W, KAMBER M, PEI J.Data Mining: Concepts and Techniques. New York, USA: Elsevier, 2011. [22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793. [23] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261. [24] WANG J L, ZHANG Y, LAN X.Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE Internatio-nal Conference on Computer and Communications. Washington, USA: IEEE, 2016: 13-18. [25] DING J J, CHEN Z T, HE X X, et al. Clustering by Finding Density Peaks Based on Chebyshev's Inequality // Proc of the 35th Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172. [26] XU X H, JU Y S, LIANG Y L, et al. Manifold Density Peaks Clustering Algorithm // Proc of the 3rd International Conference on Advanced Cloud and Big Data. Washington, USA: IEEE, 2015: 311-318. [27] ZHOU R, ZHANG S, CHEN C, et al. A Distance and Density-Based Clustering Algorithm Using Automatic Peak Detection // Proc of the IEEE International Conference on Smart Cloud. Wa-shington, USA: IEEE, 2016: 176-183. [28] 淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法.智能系统学报, 2017, 12(2): 229-236. (GAN W Y, LIU C. An Improved Clustering Algorithm that Searches and Finds Density Peaks. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236). [29] 贾培灵,樊建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法.南京大学学报(自然科学), 2017, 53(2): 368-377. (JIA P L, FAN J C, PENG Y J.An Improved Clustering Algorithm by Fast Search and Find of Density Peaks Based on Boundary Samples. Journal of Nanjing University(Natural Sciences), 2017, 53(2): 368-377.) [30] RAGHAVAN V V, DEOGUN J S, SEVER H.Introduction to Data Mining. New York, USA: John Wiley & Sons, 1998. [31] GIONIS A, MANNILA H, TSAPARAS P. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1). DOI: 10.1145/1217299.1217303. [32] CHANG H, YEUNG D Y.Robust Path-Based Spectral Clustering. Pattern Recognition, 2008, 41(1): 191-203. [33] PAL N R, PAL K, KELLER J M, et al. A Possibilistic Fuzzy c-means Clustering Algorithm. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517-530.