基于正序迭代选择策略的聚类中心自动选择方法

doi:10.16451/j.cnki.issn1003-6059.201902007

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (3748 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对密度峰值聚类算法的决策函数不能自动有效地确定聚类中心的问题,提出自动确定聚类中心的密度峰值聚类算法.首先,通过归一化处理,使决策函数中的两个变量分布均匀.然后,在确定聚类中心时,提出正序迭代选择策略,即根据聚类核心点数目的变化趋势搜索拐点,并以拐点之前的点作为聚类中心,完成聚类.最后,在UCI数据集上验证文中算法的性能,算法在未提高时间复杂度的情况下,可以对任意分布形状的数据集进行聚类,具有较好的适应性和聚类效果.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：聚类中心, 决策函数, 正序迭代, 密度峰值聚类, 数据挖掘

Abstract：The decision function of density peak clustering algorithm cannot determine the clustering center automatically and effectively. Therefore, a density peak clustering algorithm, automatically clustering by fast search and find of density peaks(AUTO-CFSFDP), is proposed. Firstly, the normalization process is carried out to make the uneven distribution of variables in the decision function become uniform. Secondly, the selection strategy based on positive-sequence iteration is presented to search elbow point according to the variation trend of the number of cluster core points in the process of determining the cluster center. A set of points before the elbow point is used as the cluster centers to complete clustering. Finally, the performance of AUTO-CFSFDP is evaluated on UCI datasets. AUTO-CFSFDP can cluster the datasets of arbitrary distributions without extra time consumption. The adaptability and clustering results are improved effectively.

Key words： Cluster Center Decision Function Positive Sequence Iterative Density Peak Clustering Data Mining

收稿日期: 2018-08-13

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61572438,61702456,61873240)资助

作者简介: 王万良(通讯作者),博士,教授,主要研究方向为深度学习、人工智能、大数据.E-mail:wwl@zjut.edu.cn. 吕闯,硕士研究生,主要研究方向为大数据、数据挖掘.E-mail:lvchuang29@163.com. 赵燕伟,博士,教授,主要研究方向为智能设计、智能控制.E-mail:zyw@zjut.edu.cn. 高楠,博士,讲师,主要研究方向为数据挖掘、最优化分析、生物信息学等.E-mail:gaonan@zjut.edu.cn. 杨小涵,硕士研究生,主要研究方向为大数据、深度学习.E-mail:58482769@qq.com. 张兆娟,博士研究生,主要研究方向为大数据分析、数据驱动的优化、深度学习.E-mail:zjzhang@zjut.edu.cn.

引用本文:

王万良, 吕闯, 赵燕伟, 高楠, 杨小涵, 张兆娟. 基于正序迭代选择策略的聚类中心自动选择方法[J]. 模式识别与人工智能, 2019, 32(2): 151-160. WANG Wanliang, LÜ Chuang, ZHAO Yanwei, GAO Nan, YANG Xiaohan, ZHANG Zhaojuan. Automatic Selection Method of Cluster Center Based on Positive Sequence Iterative Selection Strategy. , 2019, 32(2): 151-160.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201902007 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2019/V32/I2/151

[1] CHEN C L P, ZHANG C Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 2014, 275: 314-347.
[2] GAN W S, LIN J C W, CHAO H C, et al. Data Mining in Distributed Environment: A Survey. Data Mining and Knowledge Discovery, 2017, 7: e1216.
[3] XU D K, TIAN Y J.A Comprehensive Survey of Clustering Algorithms. Annals of Data Science, 2015, 2(2): 165-193.
[4] JAIN A K, DUBES R C.Algorithms for Clustering Data. New York, USA: Prentice Hall, 1988.
[5] 王万良. 人工智能及其应用.第3版.高等教育出版社, 2016.
(WANG W L.Artificial Intelligence and Application. 3rd Edition. Beijing, China: Higher Education Press, 2016)
[6] ZHANG Y M, LIU M D, LIU Q W.An Energy-Balanced Clustering Protocol Based on an Improved CFSFDP Algorithm for Wireless Sensor Networks. Sensors, 2018, 18(3). DOI: 10.3390/s18030881.
[7] ALTMAN N, KRZYWINSKI M.Points of Significance: Clustering. Nature Methods, 2017, 14(6): 545-546.
[8] QIN B Y, LI Z, LUO Z H, et al. Terahertz Time-Domain Spectroscopy Combined with PCA-CFSFDP Applied for Pesticide Detection. Optical & Quantum Electronics, 2017, 49(7). DOI: 10.1007/s11082-017-1080-x.
[9] 郑建炜,路程,秦梦洁,等.联合特征选择和光滑表示的子空间聚类算法.模式识别与人工智能, 2018, 31(5): 409-418.
(ZHENG J W, LU C, QIN M J, et al. Subspace Clustering via Joint Feature Selection and Smooth Representation. Pattern Recognition and Artificial Intelligence, 2018, 31(5): 409-418.)
[10] 逯瑞强,马福民,张腾飞.基于区间2-型模糊度量的粗糙K-means聚类算法.模式识别与人工智能, 2018, 31(3): 265-274.
(LU R Q, MA F M, ZHANG T F.Interval Type-2 Fuzzy Measure Based Rough K-means Clustering. Pattern Recognition and Artificial Intelligence, 2018, 31(3): 265-274.)
[11] 雷小锋,谢昆青,林帆,等.一种基于K-means局部最优性的高效聚类算法.软件学报, 2008, 19(7): 1683-1692.
(LEI X F, XIE K Q, LIN F, et al. An Efficient Clustering Algorithm Based on Local Optimality of K-means. Journal of Software, 2008, 19(7): 1683-1692.)
[12] ZHANG T, RAMAKRISHNAN R, LIVNY M.BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
[13] GUHA S, RASTOGI R, SHIM K.CURE: An Efficient Clustering Algorithm for Large Database. Information Systems, 2001, 26(1): 35-58.
[14] RODRIGUEZ A, LAIO A.Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496.
[15] ESTER M, KRIEGEL H P, XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the International Conference on Knowledge Discovery and Data Mining. Palo Alto, USA: AAAI Press, 1996: 226-231.
[16] XIE J Y, GAO H C, XIE W X, et al. Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-nearest Neighbors. Information Sciences, 2016, 354: 19-40.
[17] MEHMOOD R, BIE R, JIAO L B, et al. Adaptive Cutoff Distance: Clustering by Fast Search and Find of Density Peaks. Journal of Intelligent and Fuzzy Systems, 2016, 31(5): 2619-2628.
[18] WANG W, YANG J, MUNTZ R R.STING: A Statistical Information Grid Approach to Spatial Data Mining // Proc of the International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann Publisher, 1997: 186-195.
[19] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Record, 1998, 27(2): 94-105.
[20] 朱杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能, 2017, 30(5): 439-447.
(ZHU J, CHEN L F.Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439-447.)
[21] HAN J W, KAMBER M, PEI J.Data Mining: Concepts and Techniques. New York, USA: Elsevier, 2011.
[22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793.
[23] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261.
[24] WANG J L, ZHANG Y, LAN X.Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE Internatio-nal Conference on Computer and Communications. Washington, USA: IEEE, 2016: 13-18.
[25] DING J J, CHEN Z T, HE X X, et al. Clustering by Finding Density Peaks Based on Chebyshev's Inequality // Proc of the 35th Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172.
[26] XU X H, JU Y S, LIANG Y L, et al. Manifold Density Peaks Clustering Algorithm // Proc of the 3rd International Conference on Advanced Cloud and Big Data. Washington, USA: IEEE, 2015: 311-318.
[27] ZHOU R, ZHANG S, CHEN C, et al. A Distance and Density-Based Clustering Algorithm Using Automatic Peak Detection // Proc of the IEEE International Conference on Smart Cloud. Wa-shington, USA: IEEE, 2016: 176-183.
[28] 淦文燕,刘冲.一种改进的搜索密度峰值的聚类算法.智能系统学报, 2017, 12(2): 229-236.
(GAN W Y, LIU C. An Improved Clustering Algorithm that Searches and Finds Density Peaks. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236).
[29] 贾培灵,樊建聪,彭延军.一种基于簇边界的密度峰值点快速搜索聚类算法.南京大学学报(自然科学), 2017, 53(2): 368-377.
(JIA P L, FAN J C, PENG Y J.An Improved Clustering Algorithm by Fast Search and Find of Density Peaks Based on Boundary Samples. Journal of Nanjing University(Natural Sciences), 2017, 53(2): 368-377.)
[30] RAGHAVAN V V, DEOGUN J S, SEVER H.Introduction to Data Mining. New York, USA: John Wiley & Sons, 1998.
[31] GIONIS A, MANNILA H, TSAPARAS P. Clustering Aggregation. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1). DOI: 10.1145/1217299.1217303.
[32] CHANG H, YEUNG D Y.Robust Path-Based Spectral Clustering. Pattern Recognition, 2008, 41(1): 191-203.
[33] PAL N R, PAL K, KELLER J M, et al. A Possibilistic Fuzzy c-means Clustering Algorithm. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517-530.