A K-means Algorithm Based on Optimized Initial Center Points
WANG Zhong, LIU Gui-Quan, CHEN En-Hong
1.Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027 2.Key Laboratory of Software in Computing and Communications of Anhui Province, Hefei 230027
Abstract:Aiming at the problems of K-means algorithm, a method is proposed to optimize the initial center points through computing the density of objects. Thus, the initial center of the samples can be built in a heuristic way. Then, a new evaluation function is proposed, namely equalization function, and consequently the cluster number is generated automatically. Compared with the traditional algorithms, the proposed algorithm can get initial centers with higher quality and steadier cluster results. Experimental results show the effectiveness and feasibility of the proposed algorithm.
汪中,刘贵全,陈恩红. 一种优化初始中心点的K-means算法*[J]. 模式识别与人工智能, 2009, 22(2): 299-304.
WANG Zhong, LIU Gui-Quan, CHEN En-Hong. A K-means Algorithm Based on Optimized Initial Center Points. , 2009, 22(2): 299-304.
[1] Mao Guojun, Duan Lijuan, Wang Shi, et al. Principle and Algorithm of Data Mining. Beijing, China: Tsinghua University Press, 2005 (in Chinese) (毛国君,段立娟,王 实,等.数据挖掘原理与算法.北京:清华大学出版社, 2005) [2] Han J, Kamber M. Data Mining Concepts and Techniques. Orlando, USA: Morgan Kaufmann Publishers, 2001 [3] Shi Zhongzhi. Knowledge Discovery. Beijing, China: Tsinghua University Press, 2004 (in Chinese) (史忠植.知识发现.北京:清华大学出版社, 2004) [4] Huang J Z, Ng M K, Rong Hongqiang, et al. Automated Variable Weighting in K-means Type Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657-668 [5] Dhillon I S, Guan Yuqiang, Kogan J. Refining Clusters in High Dimensional Text Data // Proc of the 2nd SIAM Workshop on Clustering High Dimensional Data. Arlington, USA, 2002: 59-66 [6] Zhang B. Generalized K-Harmonic Means: Dynamic Weighting of Data in Unsupervised Learning // Proc of the 1st SIAM International Conference on Data Mining. Chicago, USA, 2001: 1-13 [7] Yang Fengzhao, Zhu Yangyong. An Efficient Method for Similarity Search on Quantitative Transaction Data.Journal of Computer Research and Development, 2004, 41(2): 361-368 (in Chinese) (杨风召,朱扬勇.一种有效的量化交易数据相似性搜索方法.计算机研究与发展, 2004, 41(2): 361-368) [8] Sarafis I, Zalzala A M S, Trinder P W. A Genetic Rule-Based Data Clustering Toolkit // Proc of the Congress on Evolutionary Computation. Honolulu, USA, 2002: 1238-1243 [9] Ma J, Perkins S. Time-Series Novelty Detection Using One-Class Support Vector Machines // Proc of the International Joint Conference on Neural Networks. Portland, USA, 2003, Ⅲ: 1741-1745 [10] Kaufman L,Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York, USA: John Wiley & Sons, 1990 [11] Qian Xian, Huang Xuanjing, Wu Lide. A Spectral Method of K-means Initialization. Acta Automatica Sinica, 2007, 33(4): 342-346 (in Chinese) (钱 线,黄萱菁,吴立德.初始化K-means的谱方法.自动化学报, 2007, 33(4): 342-346) [12] Wang Ling, Bo Liefeng, Jiao Licheng. Density-Sensitive Spectral Clustering. Acta Electronica Sinica, 2007, 35(8): 1577-1581 (in Chinese) (王 玲,薄列峰,焦李成.密度敏感的谱聚类.电子学报, 2007, 35(8): 1577-1581) [13] Rui Xu, Wunsch D I I. Survey of Clustering Algorithms. IEEE Trans on Neural Networks, 2005, 16(3): 645-678 [14] Li Yongsen, Yang Shanlin, Ma Xijun, et al. Optimization Study on K-Value of Spatial Clustering.Journal of System Simulation, 2006, 18(3): 573-576 (in Chinese) (李永森,杨善林,马溪骏,等.空间聚类算法中的K值优化问题研究.系统仿真学报, 2006, 18(3): 573-576)