|
|
Research on Selective Clustering Ensemble Algorithm Based onNormalized Mutual Information and Fractal Dimension |
WU Xiao-Xuan, NI Zhi-Wei, NI Li-Ping, ZHANG Chen |
1.School of Management, Hefei University of Technology, Hefei 230009 2.Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei 230009 |
|
|
Abstract Traditional clustering ensemble algorithm can not eliminate the influence of inferior quality clustering members and is also characterized with lower clustering accuracy. To solve the problems, a selective clustering ensemble algorithm based on fractal dimension is proposed. Firstly, the proposed algorithm is used to realize incremental clustering and can find arbitrary shape clustering. Then, according to the selection strategy of weight values based on normalized mutual information, the proposed algorithm selects high quality clustering members to realize integration by using weighted co-association matrix and get the final clustering results. The experimental results show that compared to the traditional clustering ensemble algorithm, the proposed algorithm improves the clustering quality and has good extensibility.
|
Received: 11 July 2013
|
|
|
|
|
[1] Ni Z W, Ni L P, Liu H T, et al. Dynamic Data Mining. Beijing, China: Science Press, 2010 (in Chinese) (倪志伟,倪丽萍,刘慧婷,等.动态数据挖掘.北京:科学出版社, 2010) [2] Yang L Y, Wang W Y. Clustering Ensemble Approaches: An Overview. Computer Application Research, 2005, (12): 8-10,14 (in Chinese) (阳琳赟,王文渊.聚类融合方法综述.计算机应用研究, 2005, (12): 8-10,14) [3] Jiang S Y. Cluster Fusion Algorithm Based on Majority Voting Mechanism. Journal of Chinese Computer Systems, 2007, 28(2): 306-309 (in Chinese) (蒋盛益.基于投票机制的融合聚类算法.小型微型计算机系统, 2007, 28(2): 306-309) [4] Zou Y Q, Li G H, Zhao Z Y. New Clustering Algorithm Based on Combination of Genetic Algorithm and Ant Colony Algorithm. Science Technology and Engineering, 2006, 6(23): 4700-4704,4713 (in Chinese) (邹远强,李国徽,赵梓屹.基于遗传和蚁群算法融合的聚类新方法.科学技术与工程, 2006, 6(23): 4700-4704,4713) [5] Fern X L Z, Lin W. Cluster Ensemble Selection. Statistical Analysis and Data Mining, 2008, 1(3): 128-141 [6] Azimi J, Fern X L. Adaptive Cluster Ensemble Selection // Proc of the 21st International Joint Conference on Artificial Intelligence. Pasadena, USA, 2009: 992-997 [7] Hong Y, Kwong S, Wang H L, et al. Resampling-Based Selective Clustering Ensembles. Patten Recognition Letters, 2009, 30(3): 298-305 [8] Hadjitodorov S T, Kuncheva L I, Todorova L P. Moderate Diversity for Better Cluster Ensembles. Information Fusion, 2006, 7(3): 264-275 [9] Zhou Z H, Tang W. Clusterer Ensemble. Knowledge Based Systems, 2006, 19(1): 77-83 [10] Li T, Ding C. Weighted Consensus Clustering [EB/OL]. [2013-06-11]. http://users.cis.fiu.edu/~taoli/tenure/dm08_72_li.pdf [11] Zhang Z Y, Li T, Ding C, et al. Binary Matrix Factorization for Analyzing Gene Expression Data. Data Mining and Knowledge Discovery, 2010, 20(1): 28-52 [12] Xu S, Lu Z M, Gu G C. Spectral Clustering Algorithms for Document Cluster Ensemble Problem. Journal on Communications, 2010, 31(6): 58-66 (in Chinese) (徐 森,卢志茂,顾国昌.使用谱聚类算法解决文本聚类集成问题.通信学报,2010, 31(6): 58-66) [13] Tang W, Zhou Z H. Bagging-Based Selective Cluster Ensemble. Journal of Software, 2005, 16(4): 496-502 (in Chinese) (唐 伟,周志华.基于Bagging的选择性聚类集成.软件学报, 2005, 16(4): 496-502) [14] Zhang Z, Liang Y Q, Zhang X L. Community Mining in Dynamic Complex Network: Selective Clustering Fusion Algorithm. Computer & Digital Engineering, 2013, 41(3): 388-390 (in Chinese) (张 震,梁永全,张行林.动态复杂网络社区挖掘—选择性聚类融合算法.计算机与数字工程, 2013, 41(3): 388-390) [15] Fan X P, Sheng R F, Liao Z F, et al. Selective and Weighted Clustering Fusion Algorithm. Computer Engineering and Applications, 2012, 48(22): 195-200 (in Chinese) (樊晓平,盛荣芬,廖志芳,等.一种选择性加权聚类融合算法.计算机工程与应用, 2012, 48(22): 195-200) [16] Kong Z Z, Cai Z X. Sub-grouping and Selecting Method of Cluster Fusion. Control and Decision, 2012, 27(3): 369-373 (in Chinese) (孔志周,蔡自兴.分组选择聚类融合算法.控制与决策, 2012,27(3): 369-373) [17] Barbará D, Chen P. Using the Fractal Dimension to Cluster Datasets // Proc of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA, 2000: 260-264 [18] Traina Jr C, Traina A J M, Faloutsos C. Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees // Proc of the 16th IEEE International Conference on Data Engineering. San Diego, USA, 2000: 195-209 [19] Qi X J. Research of Clustering Technique Based on Self-similarity and Grid over Data Stream. Master Dissertation. Qinhuangdao, China: Yanshan University, 2011 (in Chinese) (齐雪娇.基于分形自相似性和网格的数据流聚类技术研究.硕士学位论文.秦皇岛:燕山大学, 2011) [20] Fred A L N, Jain A K. Combining Multiple Clusterings Using Evidence Accumulation. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850 [21] Kuncheva L I, Hadjitodorov S T. Using Diversity in Cluster Ensemble // Proc of the IEEE International Conference on System, Man and Cybernetics. Hague, Netherlands, 2004, II: 1214-1219 [22] Topchy A, Jain A K, Punch W. Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881 |
|
|
|