BorderProcessing Technique in GridBased Clustering
QIU BaoZhi1,2, SHEN JunYi1
1.School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049 2.School of Information and Engineering, Zhengzhou University, Zhengzhou 450052
Abstract:In order to improve accuracy of gridbased clustering, a borderprocessing technique is proposed, Using restricted k nearest neighbors and concept of relative density. The technique enables us to separate cluster’s border points from outliers or noises accurately. Then, a gridbased clustering algorithm with border processing (GBCB) is developed. Experiment results show high accuracy of recognition of border points. Due to the only one data scan, the GBCB algorithm is very efficient with its run time being linear to the size of the input data set, and can discover arbitrary shapes of clusters and scale well.
[1] Han J W, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann, 2000 (Han J W, Kamber M,著;范 明,孟小峰,等,译.数据挖掘概念与技术.北京:机械工业出版社,2001) [2] Wang W, Yang J, Muntz R R. STING: A Statistical Information Grid Approach to Spatial Data Mining. In: Proc of the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997, 186-195 [3] Sheikholeslami G, Chatterjee S, Zhang A D. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proc of the 24th International Conference on Very Large Data Bases. New York, USA, 1998, 428-439 [4] Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc of the ACM SIGMOD International Conference on Management of Data. Seattle, USA, 1998, 94-105 [5] Zhao Y C, Song J D. GDILC: A Grid-Based Density-Isoline Clustering Algorithm. In: Proc of the International Conference on Info-Tech and Info-Net. Beijing, China, 2001, 140-145 [6] Hsu C M, Chen M S. Subspace Clustering of High Dimensional Spatial Data with Noises. In: Proc of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia, 2004, 31-40 [7]Ma E W M, Chow T W S. A New Shifting Grid Clustering Algorithm. Pattern Recognition, 2004, 37(3): 503-514 [8] Hinneburg A, Keim D A. Optional Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proc of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, 1999, 506-517