一种基于RoughSet的海量数据分割算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (535 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要处理海量数据一直是数据挖掘要解决的一个重要问题.目前已有许多并行或串行的算法来处理海量数据,然而这些算法通常都不能很好地解决速度和正确率之间的矛盾.分布式运算在处理数据上具有明显优势,因此本文考虑将一个原始的海量数据集分割成许多个独立的小数据集进行分布式处理.本文首先根据Rough Set的特点提出最佳分割的定义,然后提出一种海量数据分割算法来寻找最佳分割.通过实验测试证明结合本文提出的数据分割算法的分布式处理方案能够快速处理海量数据,而且与处理整个数据集的算法相比,正确性较高.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	覃政仁
	吴渝
	王国胤

关键词 ：粗糙集, 数据分割, 分布式处理

Abstract：Processing huge data sets is an important topic in data mining nowadays. Although many serial or parallel algorithms have been developed to deal with huge data sets, most of them are not ideal to resolve the conflict between speed and accuracy. In this paper, the whole huge data set is partitioned into many small subsets for the advantage of distributed computing. At first, a definition of best partition is proposed. Then, a roughsetbased partition algorithm is developed to look for the best partition. Experimental results prove that the distributed information processing method based on the roughsetbased partition algorithm is an effective method in dealing with huge data sets. It is faster than original roughsetbased algorithms and its performance is as good as those processing the original data set as a whole.

Key words： Rough Set Data Partition Distributed Information Processing

收稿日期: 2004-06-30

ZTFLH:

TP391

基金资助:国家自然科学基金项目(No.60373111)、教育部科学技术研究重点项目、重庆市应用基础研究基金项目和重庆市教委科学技术研究项目资助

作者简介: 覃政仁,男,1980年生,硕士研究生,主要研究方向为Rough Set理论等.E-mail: finalgas@sohu.com.吴渝,女,1970年生,教授,博士,主要研究方向为智能信息系统、自动控制理论及应用、小波分析等.王国胤,男,1970年生,教授,博士生导师,主要研究方向为Rough Set、智能信息系统、神经网络等

引用本文:

覃政仁，吴渝，王国胤. 一种基于RoughSet的海量数据分割算法^*[J]. 模式识别与人工智能, 2006, 19(2): 249-256. QIN ZhengRen, WU Yu, WANG GuoYin. A Partition Algorithm for Huge Data Sets Based on Rough Set. , 2006, 19(2): 249-256.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2006/V19/I2/249