1.School of Computer Science and Technology, Yantai University, Yantai 2640052. School of Computer Engineering and Technology, Shanghai University, Shanghai 200072
Abstract:Several measurements of the discretization schemes for continuous decision tables are discussed, including cut-point number, conditional entropy, granular entropy, class-attribute mutual information and interdependence redundancy. For consistent decision table, conditional entropy and class-attribute mutual information are both constants, and thus they can not offer more information for discretization schemes. The relationship between granular entropy and interdependence redundancy is analyzed. And it is proved that granular entropy increases when new cut points are added to the discretization scheme. A hybrid discretization algorithm is proposed to provide discretization schemes for testing. The simulation results show that the correlation coefficient between the cut-point number and classification accuracy is basically equal to that between granular entropy and classification accuracy, and both of them are correlated to datasets.
[1] Li Gang, Tong Fu. An Unsupervised Discretization Algorithm Based on Mixture Probabilistic Model. Chinese Journal of Computers, 2002, 25(2): 158-163 (in Chinese) (李 刚,童 頫.基于混合概率模型的无监督离散化算法.计算机学报, 2002, 25(2): 158-163) [2] Bay S D, Pazzani M. Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery, 2001, 5(3): 213-246 [3] Bay S D. Multivariant Discretization for Set Mining. Knowledge and Information Systems, 2001, 3(4): 491-512 [4]Xie Hong, Cheng Haozhong, Niu Dongxiao. Discretization of Continuous Attributes in Rough Set Theory Based on Information Entropy. Chinese Journal of Computers, 2005, 28(9): 1570-1574 (in Chinese) (谢 宏,程浩忠,牛东晓.基于信息熵的粗糙集连续属性离散化算法.计算机学报,2005, 28(9): 1570-1574) [5]Nguyen H S, Skowron A. Quantization of Real Value Attributes: Rough Set and Boolean Reasoning Approach // Proc of the 2nd Annual Joint Conference on Information Sciences. Wrightsville Beach, USA, 1995: 34-37 [6] Liu Lili, Wong A K C, Wang Yang. A Global Optimal Algorithm for Class-Dependent Discretization of Continuous Data. Intelligent Data Analysis, 2004, 8(2): 151-170 [7]Wang Lihong, Wu Gengfeng. Further Analysis of Discretization Lattice. Pattern Recognition and Artificial Intelligence. 2005, 18(1): 25-30 (in Chinese) (王立宏,吴耿锋.信息表离散格的进一步研究.模式识别与人工智能, 2005, 18(1): 25-30) [8]Geng Zhiqiang, Zhu Qunxiong, Li Fan. Principle of Granularity of Knowledge Roughness and Reduct Computing. Systems Engineering and Electronics, 2004, 26(8): 1112-1116 (in Chinese) (耿志强,朱群雄,李 芳.知识粗糙性的粒度原理及其约简.系统工程与电子技术, 2004, 26(8): 1112-1116) [9] Miao Duoqian, Wang Jue. An Information Representation of the Concepts and Operations in Rough Set Theory. Journal of Software, 1999, 10(2): 113-116 (in Chinese) (苗夺谦,王 珏.粗糙集理论中概念与运算的信息表示.软件学报,1999, 10(2): 113-116)