Abstract:The current clustering methods are difficult to handle the complicated problems in which shapes and densities are changing along with the data. To overcome the shortcomings of existing clustering methods, based on discrete topological manifold created in the data space, the structural similarity of samples and the class structure are described by accessibility after defining two new relativity metrics: the neighborhood density similarity and the smoothness of neighborhood density changes. The class structure relationship is proved to an equivalence relation. Then, a clustering algorithm is designed based on compressive transformation by treating the structural similarity defined on samples as the attractiveness. The algorithm is designed to handle data with any shapes and any density, maintaining good interpretability and many other advantages. Experimental result on the artificial data sets and standard data sets shows that the method is superior to the state-of-the-art methods.
[1] Richard O, Duda P E, Hart D G S. Pattern Classification. 2nd Edition. New York, USA: John Wiley Sons, 2001 [2] Theodoridis S, Koutroumbas K. Pattern Recognition. 2nd Edition. Amsterdam, Netherlands: Elsevier, 2003 [3] Zhang Tian, Ramakrishnan R, Livny M.BIRCH: An Efficient Data Clustering Method for Very Large Databases // Proc of the ACM SIGMOD International Conference on Management of Data. Montreal, Canada, 1996: 103-114 [4] Ester M, Kriegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the ACM SIGKDD International Conference on Management of Data. Montreal, Canada, 1996: 226-231 [5] Wang Wei, Yang Jiong, Muntz R. STING: A Statistical Information Grid Approach to Spatial Data Mining // Proc of the 23rd International Conference on Very Large Databases. Athens, Greece, 1997: 186-196 [6] Xu Linli, Neufeld J, Larson B, et al. Maximum Margin Clustering // Saul L K, Weiss Y, Bottou L, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2005, XVII: 1537-1544 [7] Chan P M, Schlag M D F, Zien J Y. Spectral k-Way Ratio-Cut Partitioning and Clustering // Proc of the 30th International Design Automation Conference. Dallas, USA, 1993: 749-754 [8] Frey B J, Dueck D. Clustering by Passing Messages between Data Points. Science, 2007, 315(5814): 972-976 [9] Shuai Dianxun, Dong Yumin, Shuai Qing. A New Data Clustering Approach: Generalized Cellular Automata. Information Systems, 2007, 32(7): 968-977 [10] Zhang Chaolin, Zhang Xuegong, Zhang M Q, et al. Neighbor Number, Valley Seeking and Clustering. Pattern Recognition Letters, 2007, 28(2): 173-180 [11] Dong Yihong, Cao Shaoka, Chen Ken, et al. PFHC: A Clustering Algorithm Based on Data Partitioning for Unevenly Distributed Datasets. Fuzzy Sets and Systems, 2009, 160(13): 1886-1901 [12] Wang Xi, Yang Chunyu, Zhou Jie. Clustering Aggregation by Probability Accumulation. Pattern Recognition, 2009, 42(5): 668-675