基于聚类融合的不平衡数据分类方法

摘要
图/表
参考文献(27)
相关文章 (10)

全文: PDF (656 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要不平衡数据分类问题目前已成为数据挖掘和机器学习的研究热点。文中提出一类基于聚类融合的不平衡数据分类方法，旨在解决传统分类方法对少数类的识别率较低的问题。该方法通过引入“聚类一致性系数”找出处于少数类边界区域和处于多数类中心区域的样本，并分别使用改进的SMOTE过抽样方法和改进的随机欠抽样方法对训练集的少数类和多数类进行不同的处理，以改善不同类数据的平衡度，为分类算法提供更好的训练平台。通过实验对比8种方法在一些公共数据集上的分类性能，结果表明该方法对少数类和多数类均具有较高的识别率。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈思
	郭躬德
	陈黎飞

关键词 ：不平衡数据, 聚类融合, 过抽样, 欠抽样

Abstract：Recently, classification of imbalanced data sets becomes a research hotspot in data mining and machine learning. A type of novel classification methods for imbalanced data sets based on clustering ensembles is proposed, which aims to provide a better training platform for classification methods by introducing clustering consistency index to find the cluster boundary minority examples and the cluster center majority examples. And the improved synthetic minority over-sampling technique (SMOTE) and the modified random under-sampling method are used respectively to deal with imbalanced data sets. The classifications of eight methods on some public data sets are compared. Experimental results show that the proposed methods perform better for both minority and majority classes, and are effective and feasible to deal with the imbalanced data sets.

Key words： Imbalanced Data Set Clustering Ensemble Over-Sampling Under-Sampling

收稿日期: 2009-10-20

ZTFLH:

TP391

基金资助:教育部留学回国人员基金(No.教外司留[2008]890号)、福建省自然科学基金(No.2007J0016,2009J01273)资助项目

作者简介: 陈思，女，1987年生，硕士研究生，主要研究方向为数据挖掘、模式识别.E-mail:chensi072@163.com.郭躬德，男，1965年生，博士，教授，主要研究方向为数据挖掘、模式识别、人工智能.陈黎飞，男，1972年生，博士，讲师，主要研究方向为数据挖掘、模式识别、人工智能。

引用本文:

陈思，郭躬德，陈黎飞. 基于聚类融合的不平衡数据分类方法[J]. 模式识别与人工智能, 2010, 23(6): 772-775. CHEN Si,GUO Gong-De,CHEN Li-Fei. Clustering Ensembles Based Classification Method for Imbalanced Data Sets. , 2010, 23(6): 772-775.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2010/V23/I6/772