|
|
An Efficient Method for K-Means Clustering |
HUANG Zhen-Hua1, XIANG Yang1, ZHANG Bo1, WANG Dong1, LIU Xiao-Ling2 |
1.Department of Computer Science and Technology,Tongji University,Shanghai 200092 2.Department of Computer and Information Technology,Fudan University,Shanghai 200433 |
|
|
Abstract The existing K-Means clustering methods directly act on multidimensional datasets. Hence, these methods are extremely inefficient as the cardinality of input data and the number of clustering attributes increase. Motivated by the above fact, in this paper, an efficient approach for K-Means clustering based on the structure of regular grid, called KMCRG (K-Means Clustering based on Regular Grid), is proposed. This method effectively implements K-Means clustering by taking cell as handling object. Especially, this method uses the tactics of grid weighted iteration to effectively gain the final K classes. The experiment results show that the algorithm can quickly gain the clustering results without losing clustering precision.
|
Received: 28 July 2008
|
|
|
|
|
[1] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations // Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967: 281-297 [2] Tou J. Pattern Recognition Principles. Reading, USA: Addison-Wesley, 1974 [3] Linde Y, Buzo A, Gary R. An Algorithm for Vector Quantizer Design. IEEE Trans on Communication, 1980, 28(1): 84-95 [4] Chomicki J, Godfrey P, Gryz J, et al. Skyline with Presorting: Theory and Optimization // Proc of the International Conference on Intelligent Information Systems. Wroclaw, Poland, 2005: 216-225 [5] Birgin E G, Martinez J M, Ronconi D P. Minimization Subproblems and Heuristics for an Applied Clustering Problem. European Journal of Operational Research, 2003, 146(1): 19-34 [6] Kanungo T, Mount D M, Netanyaha N S, et al. An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881-892 [7] Ester M, Kriegel H, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, USA, 1996: 226-231 [8] Corral A, Almendros J M. A Performance Comparison of Distance-Based Query Algorithms Using R-Trees in Spatial Databases. Information Sciences: An International Journal, 2007, 177(11): 2207-2237 [9] Pei Jian, Jin Wen, Ester M, et al. Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces // Proc of the 31st International Conference on Very Large Data Bases. Trondheim, Norway, 2005: 253-264 [10] Xiong Xiaopeng, Mokbel M F, Aref W G. SEA-CNN: Scalable Processing of Continuous k-Nearest Neighbor Queries in Spatio and Temporal Databases // Proc of the 21st International Conference on Data Engineering. Tokyo, Japan, 2005: 643-654 |
|
|
|