自动确定聚类中心的快速搜索和发现密度峰值的聚类算法

doi:10.16451/j.cnki.issn1003-6059.201911008

摘要
图/表
参考文献
相关文章 (11)

全文: PDF (1565 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要快速搜索和发现密度峰值的聚类算法(CFSFDP)具有不能自动确定聚类中心的缺点,文中提出自动确定聚类中心的CFSFDP.首先针对变量分布不均匀的问题,将密度和距离进行归一化处理.再通过切比雪夫不等式确定归一化后的密度阈值上限,利用标准差确定归一化后的距离阈值上限.最后根据决策函数确定决策阈值上限,统筹考虑两种决定因素,避免中心点选取遗漏,自动确定聚类中心.实验表明,文中算法可以有效地自适应选择聚类中心,具有较好的鲁棒性和有效性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：密度峰值, 聚类算法, 聚类中心, 切比雪夫不等式

Abstract：Clustering center cannot be automatically selected by the algorithm of fast search and find of density peaks. To solve the problem, automatic determination of clustering centers for clustering by fast search and find of density peaks is proposed. Firstly, density and distance are normalized for the problem of uneven distribution of variables, and then the upper limit of normalized density threshold is determined by Chebyshev inequality. Standard deviation is utilized to determine the upper limit of normalized distance threshold. Finally, the upper limit of decision threshold is determined according to the decision function. Two determinants are considered comprehensively to avoid the omission of the central point selection and realize the automatic determination of the cluster centers. The experiment shows that the adaptive selection of the clustering centers of the proposed algorithm is effective with good robustness and validity.

Key words： Density Peak Clustering Algorithm Clustering Center Chebyshev Inequality

收稿日期: 2019-03-28

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61873240)资助

通讯作者: 王万良,博士,教授,主要研究方向为深度学习、人工智能、大数据.E-mail:wwl@zjut.edu.cn.

作者简介: 吴菲,硕士研究生,主要研究方向为大数据、数据挖掘.E-mail:WFMOOK@163.com.吕闯,硕士研究生,主要研究方向为大数据、数据挖掘.E-mail:lvchuang29@163.com.

引用本文:

王万良, 吴菲, 吕闯. 自动确定聚类中心的快速搜索和发现密度峰值的聚类算法[J]. 模式识别与人工智能, 2019, 32(11): 1032-1041. WANG Wanliang, WU Fei, LÜ Chuang. Automatic Determination of Clustering Center for Clustering by Fast Search and Find of Density Peaks. , 2019, 32(11): 1032-1041.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201911008 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2019/V32/I11/1032

[1] QIAO S J, HAN N, ZHANG K F, et al. Algorithm for Detecting Overlapping Communities from Complex Network Big Data. Software, 2016, 28(3): 1-16.
[2] MORRIS K, MCNICHOLAS P D. Clustering, Classification, Discriminant Analysis, and Dimension Reduction via Generalized Hyperbolic Mixtures. Computational Statistics and Data Analysis, 2016, 97: 133-150.
[3] MAKADIA P P. Survey on Clustering Algorithms. IEEE Transactions on Neural Networks, 2014, 2(7): 105-109.
[4] QI J P, YU Y W, WANG Y W, et al. An Effective and Efficient Hierarchical K-means Clustering Algorithm. International Journal of Distributed Sensor Networks, 2017, 13(8): 15-19.
[5] KAUFMAN L, ROUSSEEUW P. Clustering by Means of Medoids // KAU-FMAN L, ROUSSEEUU P J, eds. Statistical Data Analysis Based on the L1 Norm and Related Methods. North Holland, The Netherland: North-Holland Press, 1987: 405-416.
[6] 薄华,马缚龙,焦李成.基于免疫K-means聚类的无监督SAR图像分割.模式识别与人工智能, 2008, 21(3): 376-380
(BO H, MA B L, JIAO L C. Unsupervised SAR Image Segmentation Based on Immune K-means Clustering. Pattern Recognition and Artificial Intelligence, 2008, 21(3): 376-380.)
[7] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An Efficient Data Clustering Method for Very Large Databases // Proc of the ACMSIGMOD International Conference on Management of Data. New York, USA: ACM, 1996: 103-114.
[8] MA L, FAN S H. CURE-SMOTE Algorithm and Hybrid Algorithm for Feature Selection and Parameter Optimization Based on Random Forests. BMC Bioinformatics, 2017, 18: 1-18.
[9] 黄兴,刘小青,曹步清,等.融合K-means与Agnes的Mashup服务聚类方法.小型微型计算机系统, 2015, 36(11): 2492-2497.
(HUANG X, LIU X Q, CAO B Q, et al. MSCA: Mashup Service Clustering Approach Integrating K-means and Agnes Algorithms. Journal of Chinese Computer Systems, 2015, 36(11): 2492-2497.)
[10] LI Z J, TANG Y C. Comparative Density Peaks Clustering. Expert Systems with Applications, 2017, 95: 236-247.
[11] PARMAR M, WANG D, ZHANG X F, et al. REDPC: A Residual Error-Based Density Peak Clustering Algorithm. Neurocomputing, 2019, 348: 82-96.
[12] XU X, DING S E, SHI Z Z. An Improved Density Peaks Clustering Algorithm with Fast Finding Cluster Centers. Knowledge-Based Systems, 2018, 158: 65-74.
[13] DENG C, SONG J W, SUN R Z, et al. GRIDEN: An Effective Grid-Based and Density-Based Spatial Clustering Algorithm to Support Parallel Computing. Pattern Recognition Letters, 2018, 109: 81-88.
[14] DONG S Q, LIN J J, LIU Y H, et al. Clustering Based on Grid and Local Density with Priority-Based Expansion for Multi-density Data. Information Sciences, 2018, 468: 103-116.
[15] ZHAO Q P, SHI Y, LIU Q, et al. A Grid-Growing Clustering Algorithm for Geo-Spatial Data. Pattern Recognition Letters, 2015, 53: 77-84.
[16] 王铭坤,袁少光,朱永利,等.基于Storm的海量数据实时聚类.计算机应用, 2014, 34(11): 3078-3081.
(WANG M K, YUAN S G, ZHU Y L, et al. Real-Time Clustering for Massive Data Using Storm. Journal of Computer Applications, 2014, 34(11): 3078-3081.)
[17] FAHAD A, ALSHATRI N, TARI Z, et al. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279.
[18] AYED A B, HALIMA M B, ALIMI A M. Survey on Clustering Methods: Towards Fuzzy Clustering for Big Data // Proc of the 6th International Conference on Soft Computing and Pattern Recognition. Washington, USA: IEEE, 2014: 331-336.
[19] 朱杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能,2017, 30(5): 439-447.
(ZHU J, CHEN L F. Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439－447.)
[20] FISHER D. Improving Inherence through Conceptual Clustering // Proc of the 6th National Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 1987: 461-465.
[21] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496.
[22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793.
[23] DING J J, HE X X, YUAN J Q, et al. Automatic Clustering Based on Density Peak Detection Using Generalized Extreme Value Distribution. Soft Computing, 2018, 22(9): 2777-2796.
[24] WANG J L, ZHANG Y, LAN X. Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE International Conference on Computer and Communications. Washington, USA: IEEE, 2017: 13-18.
[25] LIU Y H, MA Z M, YU F. Adaptive Density Peak Clustering Ba-sed on K-nearest Neighbors with Aggregating Strategy. Knowledge-Based Systems, 2017, 133: 208-220.
[26] DING J J, CHEN Z J, HE X X, et al. Clustering by Finding DensityPeaksBased on Chebyshev'sInequality // Proc of the 35th
Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172.
[27] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261.