Abstract:A feature selection algorithm based on association rules is presented, and the impact of support and confidence on the presented method are studied. The experimental results show that the feature subset size and classification accuracy of the presented method are better than those of other methods. Furthermore, the results indicate high support and confidence levels do not guarantee high classification accuracy and small feature subset, and the sufficient number of rules is the precondition for high efficiency of feature selection based on association rules.
[1] John G H, Kohavi R, Pfleger K. Irrelevant Features and the Subset Selection Problem // Proc of the 11th International Conference on Machine Learning. New Brunswick, USA, 1994: 121-129 [2] Koller D, Sahami M. Toward Optimal Feature Selection // Proc of the International Conference on Machine Learning. Bari, Italy, 1996: 284-292 [3] Dash M, Liu H. Feature Selection for Classification. Intelligent Data Analysis, 1997, 1(3): 131-156 [4] Kira K, Rendell L A. The Feature Selection Problem: Traditional Methods and a New Algorithm //Proc of the 9th National Conference on Artificial Intelligence. San Jose, USA, 1991: 129-134 [5]Kononenko I. Estimating Attributes: Analysis and Extension of Relief // Proc of the European Conference on Machine Learning. Catania, Italy, 1994: 171-182 [6] Michalewicz Z. Genetic Algorithms+Data Structures=Evolution Programs. New York, USA: Springer-Verlag, 1996 [7] Siedlecki W, Sklansky J. On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence, 1988, 2(2): 197-220 [8] Vafaie H, de Jong K. Genetic Algorithm as a Tool for Feature Selection in Machine Learning //Proc of the 4th International Conference on Tools with Artificial Intelligence. Arlington, USA, 1992: 200-204 [9] Jain A, Zongker D. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans on Pattern Analysis and Machine Intelligence, 1997, 19(2): 153-158 [10] Martin-Bautista M J, Vila M A. A Survey of Genetic Feature Selection in Mining Issues // Proc of the Congress on Evolutionary Computation. Washington, USA, 1999, Ⅱ: 13-23 [11]Waske B, Schiefers S, Braun M. Random Feature Selection for Decision Tree Classification of Multi-Temporal SAR Data // Proc of the IEEE International Geoscience and Remote Sensing Symposium. Denver, USA, 2006: 168-171 [12] Tian D, Keane J, Zeng Xiaojun. Evaluating the Effect of Rough Set Feature Selection on the Performance of Decision Trees // Proc of the IEEE International Conference on Granular Computing. Atlanta, USA, 2006: 57-62 [13] Chen Huanhuan, Yao Xin. Evolutionary Multiobjective Ensemble Learning Based on Bayesian Feature Selection // Proc of the IEEE Congress on Evolutionary Computation. Vancouver, USA, 2006: 267-274 [14] Wiratunga N, Lothian R, Massie S. Unsupervised Textual Feature Selection // Proc of the 8th European Conference on Case-Based Reasoning. Fethiye, Turkey, 2006: 340-354 [15] Wiratunga N, Koychev I, Massie S. Feature Selection and Generalisation for Retrieval of Textual Cases // Proc of the 7th European Conference on Case-Based Reasoning. Madrid, Spain, 2004: 806-820 [16] Agrawal R, Imilinski T, Swami A. Mining Association Rules between Sets of Items in Large Database // Proc of the ACM SIGMOD Conference on Management of Data. Washington, USA, 1993: 207-216 [17] University of California-Irvine. UCI Repository of Machine Learning Databases [DB/OL]. [2008-05-06]. http://www.ics.uci.edu/ mlearn/-MLRepository.html