A ShrinkingClustering Method for High Dimensional Data Using Flexible Size Grid
ZHANG JianYe1, PAN Quan1, LIANG JianHai2
1. School of Automation, Northwestern Polytechnical University, Xi’an 710072 2.Institute of Engineering, Air Force Engineering University, Xi’an 710038
Abstract:A shrinkingclustering method using flexible size grid is proposed to solve the clustering problem of high dimensional data in data mining. The data bins are arranged according to their density span, and the data points are moved along the direction of the density gradient. Thus the condensed and widelyseparated clusters are generated. Then the connected components of dense cells are detected using a sequence of grids with flexible size. Finally, the best clustering result is obtained when the borderline does not change again. The simulation result shows that the method could detect clusters effectively and efficiently in both low and high dimensional data.
张建业,潘泉,梁建海. 一种变栅格高维数据收缩聚类算法*[J]. 模式识别与人工智能, 2007, 20(5): 716-721.
ZHANG JianYe , PAN Quan , LIANG JianHai. A ShrinkingClustering Method for High Dimensional Data Using Flexible Size Grid. , 2007, 20(5): 716-721.
[1] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications // Proc of the ACM SIGMOD International Conference on Management of Data. Seattle, USA, 1998: 94105 [2] Duda R O, Hart P E, Stork D. Pattern Classification. New York, USA: Wiley, 2000 [3] Scott D W. Multivariate Density Estimation: Theory, Practice, and Visualization. New York, USA: John Wiley & Sons, 1992 [4] Papadimitriou S, Kitagawa H, Gibbons P B, et al. Loci: Fast Outlier Detection Using the Local Correlation Integral // Proc of the International Conference on Data Engineering. Bangalore, India, 2003: 315-326 [5] Kleinberg J. An Impossibility Theorem for Clustering // Becker S, Thrun S, Obermayer K, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002, 15: 446453 [6] Halkidi M, Vazirgiannis M. A Data Set Oriented Approach for Clustering Algorithm Selection // Proc of the 15th European Conference on Principles and Practice of Knowledge Discovery in Databases. Freiburg, Germany, 2001: 165179 [7] Ahuja N. Dot Pattern Processing Using Voronoi Neighborhoods. IEEE Trans on Pattern Analysis and Machine Intelligence, 1982, 4(3): 336343 [8] Guha S, Rastogi R, Shim K. Cure: An Efficient Clustering Algorithm for Large Databases // Proc of the ACM SIGMOD International Conference on Management of Data. Seattle, USA, 1998: 7384 [9]Bay S D. The UCI KDD Archive [DB/OL]. [20040827]. http://kdd.ics.uci.edu [10]Zhang Pingding, Wang Haijun, Wang Rui. Target Identification Based on Clustering. Journal of Air Force Engineering University: Natural Science Edition, 2006, 7(2): 29-31 (in Chinese) (张平定,王海军,王 睿.一种基于聚类思想的目标识别新方法.空军工程大学学报:自然科学版, 2006, 7(2): 29-31) [11]Zhang Liang, Zhang Fengming, Hui Xiaobin, et al. An Identification Method of Flight Data Model Based on Dynamic Fuzzy Neural Network. Journal of Air Force Engineering University: Natural Science Edition, 2006, 7(6): 16-18 (in Chinese) (张 亮,张凤鸣,惠晓滨,等.一种基于动态模糊神经网络的飞行数据模型辨识方法.空军工程大学学报:自然科学版, 2006, 7(6): 16-18)