A Divisive Hierarchical Clustering Algorithm Based on Soft Hyperspheric Partition
XIE Zhen-Ping, WANG Shi-Tong, WANG Xiao-Ming
School of Information Technology, Jiangnan University, Wuxi 214122 State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing 210016
Abstract:Hierarchical clustering is a classical data clustering method, but with two disadvantages—computational complexity and sensitivity to noises and outliers. To avoid these problems, a new divisive hierarchical clustering method is presented, called soft hyperspheric partition based divisive hierarchical clustering (SHPDHC). A new partitioning strategy, soft hyperspheric partition (SHP), is introduced. This strategy is derived from the possibilistic clustering method. SHPDHC has low computational complexity and has the ability of weakening the influence of outliers existing in the dataset, meanwhile, SHPDHC can easily produce the natural number of clusters. The theoretical analysis and experimental results on artificial datasets and real images demonstrate the effectiveness of the proposed method.
[1] Xu Rui, Wunsch D II. Survey of Clustering Algorithms. IEEE Trans on Neural Networks, 2005, 16(3): 645-678 [2] Zhang T, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases // Proc of the ACM SIGMOD International Conference on Management of Data. Montreal, Canada, 1996: 103-114 [3] Guha S, Rastogi R, Shim K. CURE: An Efficient Clustering Algorithm for Large Databases // Proc of the ACM SIGMOD International Conference on Management of Data. New York, USA, 1998: 73-84 [4] Guha S, Rastogi R, Shim K. ROCK: A Robust Clustering Algorithm for Categorical Attributes // Proc of the IEEE Conference on Data Engineering. Sydney, Australia, 1999: 512-521 [5] Karypis G, Han E H, Kumar V. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer, 1999, 32(8): 68-75 [6] Frigui H, Krishnapuram R. Clustering by Competitive Agglomeration. Pattern Recognition, 1997, 30(7): 1109-1119 [7] Yang M S, Wu K L. A Similarity-Based Robust Clustering Method. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(4): 434-448 [8] Clausi D A. K-Means Iterative Fisher (KIF) Unsupervised Clustering Algorithm Applied to Image Texture Segmentation. Pattern Recognition, 2002, 35(9): 1959-1972 [9] Dave R N, Krishnapuram R. Robust Clustering Methods: A Unified View. IEEE Trans on Fuzzy Systems, 1997, 5(2): 270-293 [10] Krishnapuram R, Keller J M. A Possibilistic Approach to Clustering. IEEE Trans on Fuzzy Systems, 1993, 1(2): 98-110 [11] Zhang J S, Yeung Y W. Improved Possibilistic C-Means Clustering Algorithms. IEEE Trans on Fuzzy Systems, 2004, 12(2): 209-217 [12] Pal N R, Pal K, Keller J M, et al. A Possibilistic Fuzzy C-Means Clustering Algorithm. IEEE Trans on Fuzzy Systems, 2005, 13(4): 517-530 [13] Yang M S, Wu K L. Unsupervised Possibilistic Clustering. Pattern Recognition, 2006, 39(1): 5-21 [14] Zhang Yajun, Liu Zhiqiang. Self-Splitting Competitive Learning: A New On-line Clustering Paradigm. IEEE Trans on Neural Networks, 2002, 13(2): 369-380 [15] Wu S, Liew A W C, Yan H, et al. Cluster Analysis of Gene Expression Data Based on Self-Splitting and Merging Competitive Learning. IEEE Trans on Information Technology in Biomedicine, 2004, 8(1): 5-15 [16] Pal N R, Bezdek J C. On Cluster Validity for the Fuzzy C-Means Model. IEEE Trans on Fuzzy Systems, 1995, 3(3): 370-379 [17] Bezdek J C, Pal N R. Some New Indexes of Cluster Validity. IEEE Trans on Systems, Man and Cybernetics, 1998, 28(3): 301-315 [18] Zahid N, Limouri M, Essaid A. A New Cluster-Validity for Fuzzy Clustering. Patter Recognition, 1999, 32(7): 1089-1097 [19] Wu K L, Yang M S. A Cluster Validity Index for Fuzzy Clustering. Pattern Recognition Letters, 2005, 26(9): 1275-1291