|
|
Clustering Ensemble with High Diversity Based on Adding Artificial Data |
LUO Hui-Lan1, KONG Fan-Sheng2, LI Yi-Xiao2 |
1.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 3410002.
Institute of Artificial Intelligence, Zhejiang University, Hangzhou 310027 |
|
|
Abstract Ensemble diversity is considered as a key factor in ensemble learning. There are many methods for constructing clustering collection or ensemble, but a few of them focus on the production of high ensemble diversity. Two methods are proposed for generating clustering ensembles with high diversity—constructing clustering ensemble by adding noise (CEAN) and improved CEAN (ICEAN). By adding artificial data, they can obtain clustering ensembles with high diversity. Compared with other commonly used methods for generating clustering ensembles, CEAN and ICEAN increase the ensemble diversity, and thus they get better clustering integration results with the same average ensemble member accuracy.
|
Received: 10 July 2007
|
|
|
|
|
[1] Topchy A P, Jain A K, Punch W F. Combining Multiple Weak Clusterings // Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003: 331-338 [2] Fred A L N. Finding Consistent Clusters in Data Partitions // Proc of the 2nd International Workshop on Multiple Classifier Systems. Cambridge, UK, 2001: 309-318 [3] Fred A L N, Jain A K. Data Clustering Using Evidence Accumulation // Proc of the 16th International Conference on Pattern Recognition. Québec, Canada, 2002, Ⅳ: 276-280 [4] Strehl A, Ghosh J. Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 2002, 3(3): 583-617 [5] Dudoit S, Fridlyand J. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 2003, 19(9): 1090-1099 [6] Law M H C, Topchy A P, Jain A K. Multiobjective Data Clustering // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅱ: 424-430 [7] Fern X Z, Brodley C E. Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach // Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003: 186-193 [8] Topchy A, Jain A K, Punch W. Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881 [9] Frossyniotis D, Likas A, Stafylopatis A. A Clustering Method Based on Boosting. Pattern Recognition Letters, 2004, 25(6): 641-654 [10] Tang Wei, Zhou Zhihua. Bagging-Based Selective Clusterer Ensemble. Journal of Software, 2005, 16(4): 496-502 (in Chinese) (唐 伟,周志华.基于Bagging的选择性聚类集成.软件学报, 2005, 16(4): 496-502) [11] Al-Razgan M, Domeniconi C. Weighted Clustering Ensembles // Proc of the 6th SIAM International Conference on Data Mining. Bethesda, USA, 2006: 24-35 [12] Domeniconi C, Papadopoulos D, Gunopulos D, et al. Subspace Clustering of High Dimensional Data // Proc of the SIAM International Conference on Data Mining. Lake Buena Vista, USA, 2004: 58-62 [13] Melvilie P, Mooney R J. Creating Diversity in Ensembles Using Artificial Data. Information Fusion, 2005, 6(1): 99-111 [14] Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392 [15] UCI Machine Learning Repository [DB/OL]. [2006-04-01]. http://www.ics.uci.edu/~mlearn/databases [16] Kuhn H W. The Hungarian Method for the Assignment Problem. Naval Research Logistics, 1955, 2(2): 83-97 |
|
|
|