|
|
Automatic Selection Method of Cluster Center Based on Positive Sequence Iterative Selection Strategy |
WANG Wanliang1, LÜ Chuang1, ZHAO Yanwei1, GAO Nan1, YANG Xiaohan1, ZHANG Zhaojuan1 |
1.College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023 |
|
|
Abstract The decision function of density peak clustering algorithm cannot determine the clustering center automatically and effectively. Therefore, a density peak clustering algorithm, automatically clustering by fast search and find of density peaks(AUTO-CFSFDP), is proposed. Firstly, the normalization process is carried out to make the uneven distribution of variables in the decision function become uniform. Secondly, the selection strategy based on positive-sequence iteration is presented to search elbow point according to the variation trend of the number of cluster core points in the process of determining the cluster center. A set of points before the elbow point is used as the cluster centers to complete clustering. Finally, the performance of AUTO-CFSFDP is evaluated on UCI datasets. AUTO-CFSFDP can cluster the datasets of arbitrary distributions without extra time consumption. The adaptability and clustering results are improved effectively.
|
Received: 13 August 2018
|
|
Fund:Supported by National Natural Science Foundation of China(No.61572438,61702456,61873240) |
About author:: (WANG Wanliang(Corresponding author), Ph.D., professor. His research interests include deep learning, artificial intelligence and big data.)(LÜ Chuang, master student. His research interests include big data and data mining.)(ZHAO Yanwei, Ph.D., professor. Her research interests include intelligent design and intelligent control.)(GAO Nan, Ph.D., lecturer. Her research interests include data mining, optimization analysis and bioinformatics.)(YANG Xiaohan, master student. Her research interests include big data and deep learning.)(ZHANG Zhaojuan, Ph.D. candidate. Her research interests include big data analysis, data-driven optimization and deep learning.) |
|
|
|
[1] CHEN C L P, ZHANG C Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 2014, 275: 314-347. [2] GAN W S, LIN J C W, CHAO H C, et al. Data Mining in Distributed Environment: A Survey. Data Mining and Knowledge Discovery, 2017, 7: e1216. [3] XU D K, TIAN Y J.A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, 2015, 2(2): 165-193. [4] JAIN A K, DUBES R C.Algorithms for Clustering Data. New York, USA: Prentice Hall, 1988. [5] 王万良. 人工智能及其应用.第3版.高等教育出版社, 2016. (WANG W L.Artificial Intelligence and Application. 3rd Edition. Beijing, China: Higher Education Press, 2016) [6] ZHANG Y M, LIU M D, LIU Q W.An Energy-Balanced Clustering Protocol Based on an Improved CFSFDP Algorithm for Wireless Sensor Networks. Sensors, 2018, 18(3). DOI: 10.3390/s18030881. [7] ALTMAN N, KRZYWINSKI M.Points of Significance: Clustering. Nature Methods, 2017, 14(6): 545-546. [8] QIN B Y, LI Z, LUO Z H, et al. Terahertz Time-Domain Spectroscopy Combined with PCA-CFSFDP Applied for Pesticide Detection. Optical & Quantum Electronics, 2017, 49(7). DOI: 10.1007/s11082-017-1080-x. [9] 郑建炜,路程,秦梦洁,等.联合特征选择和光滑表示的子空间聚类算法.模式识别与人工智能, 2018, 31(5): 409-418. (ZHENG J W, LU C, QIN M J, et al. Subspace Clustering via Joint Feature Selection and Smooth Representation. Pattern Recognition and Artificial Intelligence, 2018, 31(5): 409-418.) [10] 逯瑞强,马福民,张腾飞.基于区间2-型模糊度量的粗糙K-means聚类算法.模式识别与人工智能, 2018, 31(3): 265-274. (LU R Q, MA F M, ZHANG T F.Interval Type-2 Fuzzy Measure Based Rough K-means Clustering. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 265-274.) [11] 雷小锋,谢昆青,林帆,等.一种基于K-means局部最优性的高效聚类算法.软件学报, 2008, 19(7): 1683-1692. (LEI X F, XIE K Q, LIN F, et al. An Efficient Clustering Algorithm Based on Local Optimality of K-means. Journal of Software, 2008, 19(7): 1683-1692.) [12] ZHANG T, RAMAKRISHNAN R, LIVNY M.BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182. [13] GUHA S, RASTOGI R, SHIM K.CURE: An Efficient Clustering Algorithm for Large Database. Information Systems, 2001, 26(1): 35-58. [14] RODRIGUEZ A, LAIO A.Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [15] ESTER M, KRIEGEL H P, XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the International Conference on Knowledge Discovery and Data Mining. Palo Alto, USA: AAAI Press, 1996: 226-231. [16] XIE J Y, GAO H C, XIE W X, et al. Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-nearest Neighbors. Information Sciences, 2016, 354: 19-40. [17] MEHMOOD R, BIE R, JIAO L B, et al. Adaptive Cutoff Distance: Clustering by Fast Search and Find of Density Peaks. Journal of Intelligent and Fuzzy Systems, 2016, 31(5): 2619-2628. [18] WANG W, YANG J, MUNTZ R R.STING: A Statistical Information Grid Approach to Spatial Data Mining // Proc of the International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publisher, 1997: 186-195. [19] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Record, 1998, 27(2): 94-105. [20] 朱杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能, 2017, 30(5): 439-447. (ZHU J, CHEN L F.Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439-447.) [21] HAN J W, KAMBER M, PEI J.Data Mining: Concepts and Techniques. New York, USA: Elsevier, 2011. [22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793. [23] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261. [24] WANG J L, ZHANG Y, LAN X.Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE Internatio-nal Conference on Computer and Communications. Washington, USA: IEEE, 2016: 13-18. [25] DING J J, CHEN Z T, HE X X, et al. Clustering by Finding Density Peaks Based on Chebyshev's Inequality // Proc of the 35th Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172. [26] XU X H, JU Y S, LIANG Y L, et al. Manifold Density Peaks Clustering Algorithm // Proc of the 3rd International Conference on Advanced Cloud and Big Data. Washington, USA: IEEE, 2015: 311-318. [27] ZHOU R, ZHANG S, CHEN C, et al. A Distance and Density-Based Clustering Algorithm Using Automatic Peak Detection // Proc of the IEEE International Conference on Smart Cloud. Wa-shington, USA: IEEE, 2016: 176-183. [28] 淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法.智能系统学报, 2017, 12(2): 229-236. (GAN W Y, LIU C. An Improved Clustering Algorithm that Searches and Finds Density Peaks. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236). [29] 贾培灵,樊建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法.南京大学学报(自然科学), 2017, 53(2): 368-377. (JIA P L, FAN J C, PENG Y J.An Improved Clustering Algorithm by Fast Search and Find of Density Peaks Based on Boundary Samples. Journal of Nanjing University(Natural Sciences), 2017, 53(2): 368-377.) [30] RAGHAVAN V V, DEOGUN J S, SEVER H.Introduction to Data Mining. New York, USA: John Wiley & Sons, 1998. [31] GIONIS A, MANNILA H, TSAPARAS P. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1). DOI: 10.1145/1217299.1217303. [32] CHANG H, YEUNG D Y.Robust Path-Based Spectral Clustering. Pattern Recognition, 2008, 41(1): 191-203. [33] PAL N R, PAL K, KELLER J M, et al. A Possibilistic Fuzzy c-means Clustering Algorithm. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517-530. |
|
|
|