Abstract:In order to detect boundary points of clusters effectively, an algorithm is proposed, namely boundary points detecting algorithm based on coefficient of variation(BAND). BAND computes the average distance between one object and its k-distance neighbors. The density of each object is obtained by the reciprocal of average distance. Then the boundary points are found by using the coefficient of variation to portray the distribution of data objects. The experimental results show BAND effectively detects boundary points on noisy datasets with clusters of arbitrary shapes, sizes and different densities.
[1] Han Jiawei, Kamber M. Data Mining: Concepts and Techniques. Orlando, USA: Morgan Kaufmann Publishers, 2001 [2] Xia Chenyi, Hsu W, Lee M L, et al. BORDER: Efficient Computation of Boundary Points. IEEE Trans on Knowledge and Data Engineering, 2006, 18(3): 289-303 [3] Hsu C M, Chen M S. Subspace Clustering of High Dimensional Spatial Data with Noises // Proc of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Sydney, Australia, 2004: 31-40 [4] Qiu Baozhi, Shen Junyi. Grid-Based and Extend-Based Clustering Algorithm for Multi-Density. Control and Decision, 2006, 21(9):1011-1014 (in Chinese) (邱保志,沈钧毅.基于扩展和网格的多密度聚类算法.控制与决策, 2006, 21(9): 1011-1014) [5] Qiu Baozhi, Shen Junyi. A Border-Processing Technique in Grid-Based Clustering. Pattern Recognition and Artificial Intelligence, 2006, 19(2): 277-280 (in Chinese) (邱保志,沈钧毅.网格聚类中的边界处理技术.模式识别与人工智能, 2006, 19(2): 277-280) [6] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying Density-Based Local Outliers // Proc of the ACM SIGMOD International Conference on Management of Data. Dalles, USA, 2000: 93-104 [7] Karypis G, Ham E H, Kumar V. Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer, 1999, 32(8): 68-75