基于添加人工数据的高差异性聚类集体生成方法<sup>*</sup>

摘要
图/表
参考文献(0)
相关文章 (1)

全文: PDF (437 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要

集体差异性被认为是集成学习中的一个关键因素.在聚类集成的研究中,生成聚类集体的方法有许多种,但就专门致力于生成高差异性聚类集体的方法研究较少.基于此,本文提出生成高差异性聚类集体的方法CEAN和ICEAN,在算法中通过引入人工数据来增加聚类集体的差异性.用实验比较了CEAN和ICEAN与文献中出现的常用聚类集体生成方法,实验表明CEAN和ICEAN确实能增加生成集体的差异性,从而在相似平均集体成员准确度情况下使得聚类集成的效果更好.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	罗会兰
	孔繁胜
	李一啸

关键词 ：聚类集成, 集体差异性, 人工数据

Abstract：

Ensemble diversity is considered as a key factor in ensemble learning. There are many methods for constructing clustering collection or ensemble, but a few of them focus on the production of high ensemble diversity. Two methods are proposed for generating clustering ensembles with high diversity—constructing clustering ensemble by adding noise (CEAN) and improved CEAN (ICEAN). By adding artificial data, they can obtain clustering ensembles with high diversity. Compared with other commonly used methods for generating clustering ensembles, CEAN and ICEAN increase the ensemble diversity, and thus they get better clustering integration results with the same average ensemble member accuracy.

Key words： Clustering Ensemble Ensemble Diversity Artificial Data

收稿日期: 2007-07-10

ZTFLH:

TP181

基金资助:

江西省教育厅科技资助项目(No.教技字[2007]208号,GJJ08285)

作者简介: 罗会兰,女,1974年生,博士,主要研究方向为数据挖掘与模式识别.E-mail:luohuilan@sina.com.孔繁胜,男,1946年生,教授,博士生导师,主要研究方向为知识库系统与数据挖掘等.李一啸,男,1982年生,博士研究生,主要研究方向为数据挖掘与模式识别.

引用本文:

罗会兰，孔繁胜，李一啸. 基于添加人工数据的高差异性聚类集体生成方法^*[J]. 模式识别与人工智能, 2008, 21(5): 682-688. LUO Hui-Lan, KONG Fan-Sheng, LI Yi-Xiao. Clustering Ensemble with High Diversity Based on Adding Artificial Data. , 2008, 21(5): 682-688.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I5/682

[1] Topchy A P, Jain A K, Punch W F. Combining Multiple Weak Clusterings // Proc of the 3rd IEEE International Conference on Data Mining. Melbourne, USA, 2003: 331-338
[2] Fred A L N. Finding Consistent Clusters in Data Partitions // Proc of the 2nd International Workshop on Multiple Classifier Systems. Cambridge, UK, 2001: 309-318
[3] Fred A L N, Jain A K. Data Clustering Using Evidence Accumulation // Proc of the 16th International Conference on Pattern Recognition. Québec, Canada, 2002, Ⅳ: 276-280
[4] Strehl A, Ghosh J. Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research, 2002, 3(3): 583-617
[5] Dudoit S, Fridlyand J. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 2003, 19(9): 1090-1099
[6] Law M H C, Topchy A P, Jain A K. Multiobjective Data Clustering // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA, 2004, Ⅱ: 424-430
[7] Fern X Z, Brodley C E. Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach // Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003: 186-193
[8] Topchy A, Jain A K, Punch W. Clustering Ensembles: Models of Consensus and Weak Partitions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881
[9] Frossyniotis D, Likas A, Stafylopatis A. A Clustering Method Based on Boosting. Pattern Recognition Letters, 2004, 25(6): 641-654
[10] Tang Wei, Zhou Zhihua. Bagging-Based Selective Clusterer Ensemble. Journal of Software, 2005, 16(4): 496-502 (in Chinese)
(唐伟,周志华.基于Bagging的选择性聚类集成.软件学报, 2005, 16(4): 496-502)
[11] Al-Razgan M, Domeniconi C. Weighted Clustering Ensembles // Proc of the 6th SIAM International Conference on Data Mining. Bethesda, USA, 2006: 24-35
[12] Domeniconi C, Papadopoulos D, Gunopulos D, et al. Subspace Clustering of High Dimensional Data // Proc of the SIAM International Conference on Data Mining. Lake Buena Vista, USA, 2004: 58-62
[13] Melvilie P, Mooney R J. Creating Diversity in Ensembles Using Artificial Data. Information Fusion, 2005, 6(1): 99-111
[14] Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392
[15] UCI Machine Learning Repository [DB/OL]. [2006-04-01]. http://www.ics.uci.edu/~mlearn/databases
[16] Kuhn H W. The Hungarian Method for the Assignment Problem. Naval Research Logistics, 1955, 2(2): 83-97