Abstract:Since rough sets theory unveils the dependency of data and implements data reduction, it has attracted much attention from more and more fields. Moreover, discretization of continuous attributes plays an important role in rough sets theory and other induction learning systems. Because discretization is viewed as a process of information generalization (or abstraction) and data reduction, a global discretization algorithm is proposed based on rough sets theory. It modifies the criterion of selecting the best cut points, and introduces inconsistency checking to preserve the fidelity of the original data,which changes the MDLP method into a global one. Then the reduction of cut points is performed to lead to small size learning model while keeping the consistency level. The proposed algorithm is tested on several data sets with ID3 and ROSETTA. Experimental results show that this method performs better than MDLP and it is also superior to processing continuous data directly without discretization.
[1] Su C T, Hsu J H. An Extended Chi2 Algorithm for Discretization of Real Value Attributes. IEEE Trans on Knowledge and Data Engineering, 2005, 17(3): 437 - 441 [2] Li M X, Wu C D, Han Z H, Yue Y. A Hierarchical Clustering Method for Attribute Discretization in Rough Set Theory. In: Proc of the International Conference on Machine Learning and Cybernetics. Shanghai, China, 2004, Ⅵ: 3650-3654 [3] Zhao J, Wang G Y, Wu Z F, et al. Method of Data Discretization Based on Rough Set Theory. Mini-Micro Systems, 2004, 25(1): 60-64 (in Chinese) (赵 军,王国胤,吴中福,等.基于粗集理论的数据离散化方法.小型微型计算机系统, 2004, 25(1): 60-64) [4] He Y Q, Hu S S. A New Method for Continuous Value Attribute Discretization in Rough Set Theory. Journal of Nanjing University of Aeronautics & Astronautics, 2003, 35(2): 212-215 (in Chinese) (何亚群, 胡寿松. 粗糙集中连续属性离散化的一种新方法. 南京航空航天大学学报, 2003, 35(2): 212-215) [5] Fayyad U M, Irani K B. Multi-Interval Discretization of Continuous Valued Attributes for Classification Learning. In: Proc of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France, 1993, 1022-1027 [6] Chmielewski M R, Grzymala-Busse J W. Global Discretization of Continuous Attributes as Preprocessing for Machine Learning. International Journal of Approximate Reasoning, 1996, 15(4): 319-331 [7] Pawlak Z, et al. Rough Sets. Communications of the ACM, 1995, 38(11): 89-95