Abstract:In machine learning area, classification algorithms are widely studied and a large number of different types of algorithms are proposed. How to select appropriate ones from so many classification algorithms for the datasets becomes a crucial problem. Recently, a new method in reference [8] is proposed to characterize datasets and achieve better results in algorithm recommendation. In this paper, two methods are presented to characterize datasets under the theory of interaction information. The performance of 12 different types of classification algorithms on the 98 UCI datasets illustrates that both two-variable and three-variable interaction information methods can improve the precision and the hit rate of recommended algorithms. Furthermore, the latter performs even better under datasets with poor adaptability.
刘娟,朱翔鸥,刘文斌. 基于交互信息的数据集特征结构研究[J]. 模式识别与人工智能, 2014, 27(1): 82-88.
LIU Juan, ZHU Xiang-Ou, LIU Wen-Bing. Research on Dataset Feature Structure Based on Interaction Information. , 2014, 27(1): 82-88.
[1] Weiss M S, Kapouleas L. An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods // Proc of the 11th International Joint Conference on Artificial Intelligence. Detroit, USA, 1989: 781-787 [2] Shavlik J W, Mooney R J, Towell G G. Symbolic and Neural Learning Algorithms: An Experimental Comparison. Machine Learning, 1991, 6(2): 111-143 [3] Duin R P W. A Note on Comparing Classifiers. Pattern Recognition Letters, 1996, 17(5): 529-536 [4] Brazdil P, Gama J, Henery B. Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning // Proc of the European Conference on Machine Learning. Catania, Italy, 1994: 83-102 [5] Gama J, Brazdil P. Characterization of Classification Algorithms // Proc of the 7th Portuguese Conference on Artificial Intelligence. Funchal, Portugal, 1995: 189-200 [6] Ali S, Smith K A. On Learning Algorithm Selection for Classification. Applied Soft Computing, 2006, 6(2): 119-138 [7] Kalousis A, Gama J, Hilario M. On Data and Algorithms: Understanding Inductive Performance. Machine Learning, 2004, 54(3): 275-312 [8] Song Qinbao, Wang Guangtao, Wang Chao. Automatic Recommendation of Classification Algorithms Based on Data Set Characteristics. Pattern Recognition, 2012, 45(7): 2672-2689 [9] Chanda P, Cho Y R, Zhang Aidong, et al. Mining of Attribute Interactions Using Information Theoretic Metrics // Proc of the IEEE International Conference on Data Mining. Miami, USA, 2009: 350-355 [10] Jakulin A, Bratko I. Testing the Significance of Attribute Interactions // Proc of the 21st International Conference on Machine Learning. Banff, Canada, 2004: 409-416 [11] Jakulin A. Machine Learning Based on Attribute Interactions. Master Dissertation. Ljubljana, The Republic of Slovenia: University of Ljubljana, 2005 [12] Jakulin A, Bratko I. Analyzing Attribute Dependencies // Proc of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. Dubrovnik, Croatia, 2003: 229-240 [13] Jakulin A, Bratko I, Smrke D, et al. Attribute Interactions in Medical Data Analysis // Proc of the 9th Conference on Artificial Intelligence in Medicine in Europe. Protaras, Cyprus, 2003: 229-238 [14] Xie Jingbo, Wang Xizhao. An Extended Heuristic Algorithm to ID3 Based on the Mutual Information between Attributes. Computer Engineering and Applications, 2004, 40(3): 93-94 (in Chinese) (谢竞博,王熙照.基于属性间交互信息的ID3算法.计算机工程与应用, 2004, 40(30): 93-94)