一种优化初始中心点的K-means算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (369 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对K-means算法所存在的问题，提出一种优化初始中心点的算法.采用密度敏感的相似性度量来计算对象的密度，启发式地生成样本初始中心.然后设计一种评价函数——均衡化函数，并以均衡化函数为准则自动生成聚类数目.与传统算法相比，本文算法可得到较高质量的初始中心和较稳定的聚类结果.实验结果表明该算法的有效性和可行性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	汪中
	刘贵全
	陈恩红

关键词 ： K-means算法, 密度, 初始中心点, 均衡化函数

Abstract：Aiming at the problems of K-means algorithm, a method is proposed to optimize the initial center points through computing the density of objects. Thus, the initial center of the samples can be built in a heuristic way. Then, a new evaluation function is proposed, namely equalization function, and consequently the cluster number is generated automatically. Compared with the traditional algorithms, the proposed algorithm can get initial centers with higher quality and steadier cluster results. Experimental results show the effectiveness and feasibility of the proposed algorithm.

Key words： K-means Algorithm Density Initial Center Point Equalization Function

收稿日期: 2007-09-30

ZTFLH:

TP311

基金资助:国家自然科学基金资助项目(No. 60775037)

作者简介: 汪中，男，1984年生，硕士研究生，主要研究方向为数据挖掘、机器学习.E-mail: wzspb@mail.ustc.edu.cn.刘贵全，男，1970年生，博士，副教授，主要研究方向为数据挖掘、人工智能、网络安全等.陈恩红，男，1968年生，教授，博士生导师，主要研究方向为数据挖掘、机器学习、网络信息处理等.

引用本文:

汪中，刘贵全，陈恩红. 一种优化初始中心点的K-means算法^*[J]. 模式识别与人工智能, 2009, 22(2): 299-304. WANG Zhong, LIU Gui-Quan, CHEN En-Hong. A K-means Algorithm Based on Optimized Initial Center Points. , 2009, 22(2): 299-304.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2009/V22/I2/299

[1] Mao Guojun, Duan Lijuan, Wang Shi, et al. Principle and Algorithm of Data Mining. Beijing, China: Tsinghua University Press, 2005 (in Chinese)
(毛国君,段立娟,王实,等.数据挖掘原理与算法.北京:清华大学出版社, 2005)
[2] Han J, Kamber M. Data Mining Concepts and Techniques. Orlando, USA: Morgan Kaufmann Publishers, 2001
[3] Shi Zhongzhi. Knowledge Discovery. Beijing, China: Tsinghua University Press, 2004 (in Chinese)
(史忠植.知识发现.北京:清华大学出版社, 2004)
[4] Huang J Z, Ng M K, Rong Hongqiang, et al. Automated Variable Weighting in K-means Type Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657-668
[5] Dhillon I S, Guan Yuqiang, Kogan J. Refining Clusters in High Dimensional Text Data // Proc of the 2nd SIAM Workshop on Clustering High Dimensional Data. Arlington, USA, 2002: 59-66
[6] Zhang B. Generalized K-Harmonic Means: Dynamic Weighting of Data in Unsupervised Learning // Proc of the 1st SIAM International Conference on Data Mining. Chicago, USA, 2001: 1-13
[7] Yang Fengzhao, Zhu Yangyong. An Efficient Method for Similarity Search on Quantitative Transaction Data.Journal of Computer Research and Development, 2004, 41(2): 361-368 (in Chinese)
(杨风召,朱扬勇.一种有效的量化交易数据相似性搜索方法.计算机研究与发展, 2004, 41(2): 361-368)
[8] Sarafis I, Zalzala A M S, Trinder P W. A Genetic Rule-Based Data Clustering Toolkit // Proc of the Congress on Evolutionary Computation. Honolulu, USA, 2002: 1238-1243
[9] Ma J, Perkins S. Time-Series Novelty Detection Using One-Class Support Vector Machines // Proc of the International Joint Conference on Neural Networks. Portland, USA, 2003, Ⅲ: 1741-1745
[10] Kaufman L,Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York, USA: John Wiley & Sons, 1990
[11] Qian Xian, Huang Xuanjing, Wu Lide. A Spectral Method of K-means Initialization. Acta Automatica Sinica, 2007, 33(4): 342-346 (in Chinese)
(钱线,黄萱菁,吴立德.初始化K-means的谱方法.自动化学报, 2007, 33(4): 342-346)
[12] Wang Ling, Bo Liefeng, Jiao Licheng. Density-Sensitive Spectral Clustering. Acta Electronica Sinica, 2007, 35(8): 1577-1581 (in Chinese)
(王玲,薄列峰,焦李成.密度敏感的谱聚类.电子学报, 2007, 35(8): 1577-1581)
[13] Rui Xu, Wunsch D I I. Survey of Clustering Algorithms. IEEE Trans on Neural Networks, 2005, 16(3): 645-678
[14] Li Yongsen, Yang Shanlin, Ma Xijun, et al. Optimization Study on K-Value of Spatial Clustering.Journal of System Simulation, 2006, 18(3): 573-576 (in Chinese)
(李永森,杨善林,马溪骏,等.空间聚类算法中的K值优化问题研究.系统仿真学报, 2006, 18(3): 573-576)