Abstract:To get better clustering results,it is necessary to choose a suitable clustering algorithm for the cluster structure of a given dataset. Selection of clustering algorithms based on Grid-MST is proposed to choose a suitable clustering algorithm for the data set automatically. The Grid-MST is constructed on the basis of the dataset by the proposed method,and the potential cluster structures are found by the number of trees. Then,a suitable clustering algorithm is selected to the discovered cluster structure. The experimental results on artificial datasets and real datasets show that the proposed method is efficient.
[1\]Chen G,Jaradat S A,Banerjee N,Xet al. Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data. Statistica Sinica,2002,12: 241-262 [2]Chen Weifu,Feng Guocan. Spectral Clustering with Discriminant Cuts. Knowledge Based Systems,2012,28: 27-37 [3]Pan Xiaoying,Liu Fang,Jiao Licheng. Density Sensitive Based Multi Agent Evolutionary Clustering Algorithm. Journal of Software,2010,21(10): 2420-2431 (in Chinese) (潘晓英,刘 芳,焦李成. 密度敏感的多智能体进化聚类算法. 软件学报,2010,21(10): 2420-2431) [4]Reddy D,Mishra D,Jana P K. MST Based Cluster Initialization for K Means. Communications in Computer and Information Science. 2011,131(2): 329-338 [5]Fan Ming. Application of Clustering Algorithms in Web Mining. Master Dissertation. Xi’an,China: Northwestern Polytechnical University,2007 (in Chinese) (范 明.聚类算法在web挖掘中的应用.硕士学位论文.西安:西北工业大学,2007) [6]Jain A K. Data Clustering: 50 Years Beyond K means. Pattern Recognition Letters,2010,31(8): 651-666 [7]Sun Jigui,Liu Jie,Zhao Lianyu. Clustering Algorithms Research. Journal of Solftware,2008,19(1): 48-61 (in Chinese) (孙吉贵,刘 杰,赵连宇. 聚类算法研究. 软件学报,2008,19(1): 48-61) [8]Frey B J,Dueck D. Clustering by Passing Messages between Data Points. Science,2007,315(5814): 972-976 [9]Fred A L N,Leito J M N. Partitional vs Hierarchical Clustering Using a Minimum Grammar Complexity Approach // Proc of the Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 2000) and Statistical Pattern Recognition(SPR 2000). Alicante,Spain,2000: 193-202 [10]Agrawal R,Gehrke J,Gunopoulos D,Xet al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications // Proc of the ACM SIGMOD International Conference on Management of Data. Chicago,USA,1998: 94-105 [11]Krote B,Vygen J. Spanning Trees and Arborescences. Algorithms and Combinatorics. 2012,21: 131-155 [12]Wang Jianhui,Shen Zhan,Hu Yunfa. An Applicable and Eeficient Clustering Algorithm. Journal of Software,2004,15(5): 697-705 (in Chinese) (王建会,申 展,胡运发. 一种实用高效的聚类算法. 软件学报,2004,15(5): 697-705) [13]Zhou Yantao,Wu Zhengguo,Yi Xingdong. Extended Grid Based Clustering Algorithm with Referential Parameters. Journal of Hunan University: Natural Sciences,2009,36(2): 48-52 (in Chinese) (周炎涛,吴正国,易兴东. 基于网格带有参考参数的扩展聚类算法. 湖南大学学报:自然科学版,2009,36(2): 48-52) [14]Wang Kaijun,Li Xiao. Selection of Clustering Algorithms Based on a Validity Index. Journal of Sichuan Normal University: Natural Science,2011,34(6): 915-918(in Chinese) (王开军,李 晓. 基于有效性指标的聚类算法选择. 四川师范大学学报:自然科学版,2011,34(6): 915-918) [15]Wang Kaijun,Wang Baijie,Peng Liuqing. CVAP: Validation for Cluster Analyses. Data Science Journal,2009,8: 88-93 [16]Kuncheva L I,Vetrov D P. Evaluation of Stability of k Means Cluster Ensembles with Respect to Random Initialization. IEEE Trans on Pattern Analysis and Machine Intelligence,2006,28(11): 1798-1808 [17]Ng A Y,Jordan M I,Weiss Y. On Spectral Clustering: Analysis and an Algorithm // Dietterich T G,Becker S,Ghahramani Z,eds. Advances in Neural Information Processing Systems. Cambridge,USA: MIT Press,2001,XIV: 849-856 [18]Lange T,Roth V,Braun M L,Xet al. Stability Based Validation of Clustering Solutions. Neural Computation,2004,16(6): 1299-1323 [19]Wang Kaijun,Yan Xuanhui,Chen Lifei. Geometric Double Entity Model for Recognizing Far Near Relations of Clusters. Science China: Information Sciences,2011,54(10): 2040-2050 [20]Medvedovic M,Yeung K Y,Bumgarner R E. Bayesian Mixture Model Based Clustering of Replicated Microarray Data. Bioinformatics,2004,20(8): 1222-1232 [21]Alon U,Barkai N,Notterman D A,Xet al. Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc of the National Academy of Sciences of USA,1999,96(12): 6745-6750 [22]Van der Maaten L,Hiton G E. Visualizing Data Using t SNE. Journal of Machine Learning Research,2008,9: 2579-2605