基于混合度量与类簇自适应调整的粗糙模糊<i>K</i>-means聚类算法

doi:10.16451/j.cnki.issn1003-6059.201912010

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (879 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对粗糙K-means聚类及其相关衍生算法需要提前人为给定聚类数目、随机选取初始类簇中心导致类簇交叉区域的数据划分准确率偏低等问题,文中提出基于混合度量与类簇自适应调整的粗糙模糊K-means聚类算法.在计算边界区域的数据对象归属于不同类簇的隶属程度时,综合考虑局部密度和距离的混合度量,并采用自适应调整类簇数目的策略,获得最佳聚类数目.选取数据对象稠密区域中距离最小的两个样本的中点作为初始类簇中心,将附近局部密度高于平均密度的对象划分至该簇后再选取剩余的初始类簇中心,使初始类簇中心的选取更合理.在人工数据集和UCI标准数据集上的实验表明,文中算法在处理类簇交叠严重的球簇状数据集时,具有自适应性,聚类精度较优.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张鑫涛
	马福民
	曹杰
	张腾飞

关键词 ：粗糙模糊聚类, 粗糙K-means, 混合度量, 类簇自适应, 局部密度

Abstract：Rough K-means clustering and its related derivative algorithms need the number of clusters in advance, and random selection of the initial cluster center results in low accuracy of data partition in the cross-region of clusters. To solve these problems, a rough fuzzy K-means clustering algorithm with adaptive adjustment of clusters is proposed. When the membership degree of the data objects belonging to different clusters in the intersection area of the cluster boundary is calculated, the mixed metrics of local density and distance are taken into account in the proposed algorithm. The optimal number of clusters is gained by adjusting the number of clusters adaptively. The midpoint of two samples with the smallest distance in the dense area of data objects is selected as the initial cluster center. The object with the local density higher than the average density is divided into the cluster, and then the re-maining initial cluster center can be selected. Thus, the selection of the initial cluster centers is more reasonable. The experiments on synthetic datasets and UCI datasets demonstrate the advantages of the proposed algorithm in adaptability and clustering accuracy for dealing with spherical clusters with blurred boundaries.

Key words： Rough Fuzzy Clustering Rough K-means Mixed Metrics Cluster Adaptive Adjustment Local Density

收稿日期: 2019-06-10

ZTFLH:

TP 18

基金资助:国家重点研发计划项目(No.2017YFD0401001)、国家自然科学基金项目(No.61973151,61833011)、江苏省自然科学基金项目(No.BK20191376,BK20191406)、江苏省高校自然科学研究重大项目(No.17KJA120001)、江苏省研究生科研与实践创新计划项目(No.KYCX18_1388)资助

通讯作者: 马福民,博士,教授,主要研究方向为智能信息处理、智能生产系统等.E-mail:fmmatj@126.com.

作者简介: 张鑫涛,硕士研究生,主要研究方向为信息处理、数据挖掘.E-mail:18305153916@163.com.曹杰,博士,教授,主要研究方向为商务智能、数据挖掘等.E-mail:caojie6909@163.com.张腾飞,博士,教授,主要研究方向为智能信息处理、大数据分析等.E-mail:tfzhang@126.com.

引用本文:

张鑫涛, 马福民, 曹杰, 张腾飞. 基于混合度量与类簇自适应调整的粗糙模糊K-means聚类算法[J]. 模式识别与人工智能, 2019, 32(12): 1141-1150. ZHANG Xintao, MA Fumin, CAO Jie, ZHANG Tengfei. Rough Fuzzy K-means Clustering Algorithm Based on Mixed Metrics and Cluster Adaptive Adjustment. , 2019, 32(12): 1141-1150.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201912010 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2019/V32/I12/1141

[1] JAIN A K, MURTY M N, FLYNN P J. Data Clustering: A Review. ACM Computing Surveys, 1999, 31(3): 264-323.
[2] HAN J W, KAMBER M, PEI J. Data Mining: Concepts and Techniques. 3rd Edition. San Francisco, USA: Morgan Kaufmann Publishers, 2011.
[3] ZHANG W, YOSHIDA T, TANG X J, et al. Text Clustering Using Frequent Item Sets. Knowledge-Based Systems, 2010, 23(5): 379-388.
[4] 吴烨,钟志农,熊伟,等.一种高效的属性图聚类方法.计算机学报, 2013, 36(8): 1704-1713.
(WU Y, ZHONG Z N, XIONG W, et al. An Efficient Method for Attributed Graph Clustering. Chinese Journal of Computers, 2013, 36(8): 1704-1713.)
[5] CHUANG K S, TZENG H L, CHEN S, et al. Fuzzy C-means Clustering with Spatial Information for Image Segmentation. Compute-rized Medical Imaging and Graphics, 2006, 30(1): 9-15.
[6] DUNN J C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cyberne-tics, 1973, 3(3): 32-57.
[7] LINGRAS P, WEST C. Interval Set Clustering of Web Users with Rough K-means. Journal of Intelligent Information Systems(Integrating Artificial, Intelligence and Database Technologies), 2004, 23(1): 5-16.
[8] MITRA S, BANKA H, PEDRYCZ W. Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man, and Cybernetics, 2006, 36(4): 795-805.
[9] 王岩,彭涛,韩佳育,等.一种基于密度的分布式聚类方法.软件学报, 2017, 28(11): 2836-2850.
(WANG Y, PENG T, HAN J Y, et al. Density-Based Distributed Clustering Method. Journal of Software, 2017, 28(11): 2836-2850.)
[10] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496.
[11] 马福民,逯瑞强,张腾飞.基于局部密度自适应度量的粗糙K-means聚类算法.计算机工程与科学, 2018, 40(1): 184-190.
(MA F M, LU R Q, ZHANG T F. Rough K-means Clustering Based on Local Density Adaptive Measure. Computer Engineering and Science, 2018, 40(1): 184-190.)
[12] WU S C, PANG Y J, SHAO S, et al. Advanced Fuzzy C-means Algorithm Based on Local Density and Distance. Journal of Shanghai Jiaotong University(Science), 2018, 23(5): 636-642.
[13] FRIGUI H, KRISHNAPURAM R. A Robust Competitive Clus-tering Algorithm with Application in Computer Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450-465.
[14] FAZENDEIRO P, DE OLIVEIRA J V. Observer-Biased Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 2015, 23(1): 85-97.
[15] DEMPSTER A P, LAIRD N M, RULIN D B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society.Series B(Methodological), 1977, 39(1): 1-38.
[16] YANG M S, LAI C Y, LIN C Y. A Robust EM Clustering Algorithm for Gaussian Mixture Models. Pattern Recognition, 2012, 45(11): 3950-3961.
[17] YANG M S, NATALIANI Y. Robust-Learning Fuzzy C-means Clustering Algorithm with Unknown Number of Clusters. Pattern Recognition, 2017, 71: 45-59.
[18] ZHANG Y P, CHUNG F L, WANG S T. Fast Exemplar-Based Clustering by Gravity Enrichment between Data Objects. IEEE Transactions on Systems, Man, and Cybernetics(Systems), 2018.
DOI: 10.1109/TSMC.2018.2833139.
[19] FAHAD A, ALSHATRI N, TARI Z, et al. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279.