Abstract:The existing rough set based attribute reduction algorithms are mainly designed for the problem of the underlying data residing in the main memory. Therefore, the limitation of their application to attribute reduction computation of huge data results in a relatively poor scalability. Inspired by supervised learning in quest (SLIQ) algorithm, a specific data pre-processing strategy is introduced and a fast attribute reduction algorithm is proposed with time complexity O(|U||C|). The experimental results show that the proposed algorithm is of good scalability.
[1] Pawlak Z. Rough Set. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356 [2] Wang Guoyin. Rough Set Theory and Knowledge Acquisition. Xi'an, China: Xi'an Jiaotong University Press, 2001 (in Chinese) (王国胤.Rough集理论与知识获取.西安:西安交通大学出版社, 2001) [3] Hu Xiaohua, Cercone N. Learning in Relational Database: A Rough Set Approach. International Journal of Computational Intelligence, 1995, 11(2): 323-338 [4] Ye Dongyi. An Improvement to Jelonek's Attribute Reduction Algorithm. Acta Electronica Sinica, 2000, 28(12): 81-82 (in Chinese) (叶东毅.Jelonek属性约简算法的一个改进.电子学报, 2000, 28(12): 81-82) [5] Liu Shaohui, Sheng Qiujian, Shi Zhongzhi. A New Method for Fast Computing Positive Region. Journal of Computer Research and Development, 2003, 40(5): 637-642 (in Chinese) (刘少辉,盛秋戬,史忠植.一种新的快速计算正区域的方法.计算机研究与发展, 2003, 40(5): 637-642) [6] Liu Shaohui, Sheng Qiujian, Wu Bin, et al. Research on Efficient Algorithms for Rough Set Methods. Chinese Journal of Computers, 2003, 26(5): 524-529 (in Chinese) (刘少辉,盛秋戬,吴 斌,等.Rough集高效算法的研究.计算机学报, 2003, 26(5): 524-529) [7] Hu Feng, Wang Guoyin. Analysis of the Complexity of Quick Sort for Two Dimension Table. Chinese Journal of Computers, 2007, 30(6): 963-968 (in Chinese) (胡 峰,王国胤.二维表快速排序的复杂度分析.计算机学报, 2007, 30(6): 963-968) [8] Xu Zhangyan, Liu Zuopeng, Yang Binru, et al. A Quick Attribute Reduction Algorithm with Complexity of max(O(|C||U|),O(|C|2|U/C|)). Chinese Journal of Computers, 2006, 29(3): 391-399 (in Chinese) (徐章艳,刘作鹏,杨炳儒,等.一个复杂度为公式的快速属性约简算法. 计算机学报, 2006, 29(3): 391-399) [9] Wang Guoyin, Yu Hong, Yang Dachun. Decision Table Reduction Based on Conditional Information Entropy. Chinese Journal of Computers, 2002, 25(7): 759-766 (in Chinese) (王国胤,于 洪,杨大春.基于条件信息熵的决策表约简.计算机学报, 2002, 25(7): 759-766) [10] Mehta M, Agrawal R, Rissanen J. SLIQ: A Fast Scalable Classifier for Data Mining // Proc of the 5th International Conference on Extending Database Technology. Avignon, France, 1996: 18-32 [11] Shafer J C. Agrawal R, Mehta M. SPRINT: A Scalable Parallel Classifier for Data Mining // Proc of the 22nd International Conference on Very Large Databases. San Mateo, USA, 1996: 544-555 [12] Gehrke J, Ramakrishnan R, Ganti V. Rainforest: A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery, 2000, 4(2/3): 127-162 [13] Ye Dongyi, Chen Zhaojiong. A New Discernibility Matrix and the Computation of a Core. Acta Electronica Sinica, 2002, 30(7): 1086-1088 (in Chinese) (叶东毅,陈昭炯.一个新的差别矩阵及其求核方法.电子学报, 2002, 30(7): 1086-1088) [14] Xu Zhangyan, Yang Bingru, Song Wei. Quick Computing Core Algorithm Based on Discernibility Matrix. Computer Engineering and Applications, 2006, 42(6): 4-6 (in Chinese) (徐章艳,杨炳儒,宋 威.一个基于差别矩阵的快速求核算法.计算机工程与应用, 2006, 42(6): 4-6)