Feature Selection Algorithm for Incomplete Data Based on Information Entropy
CHEN Sheng-Bing1, WANG Xiao-Feng1,2
1.Key Laboratory of Network and Intelligent Information Processing,Department of Computer Science and Technology, Hefei University, Hefei 2306012. 2..Intelligent Computing Laboratory, Institute of Intelligent Machines, Chinese Academy of Sciences,Hefei 230031
Abstract:Grounded on the analysis of the existing incomplete information entropy, the concept of incomplete information entropy based on similarity relations (SIIE) is proposed, and some properties of SIIE are discussed. A feature selection algorithm for incomplete data is presented. In this algorithm, SIIE of incomplete data is calculated directly, and SIIE is taken as the criteria for feature selection. Then, the sequential forward floating search method is employed to addresses the problem of correlation among features. Experiments on UCI database are carried out, and the results indicate the accuracy and efficiency of the proposed algorithm.
陈圣兵,王晓峰. 基于信息熵的不完备数据特征选择算法*[J]. 模式识别与人工智能, 2014, 27(12): 1131-1137.
CHEN Sheng-Bing, WANG Xiao-Feng. Feature Selection Algorithm for Incomplete Data Based on Information Entropy. , 2014, 27(12): 1131-1137.
[1] Balamurugan S A A, Rajaram R. Effective and Efficient Feature Selection for Large-Scale Data Using Bayes' Theorem. International Journal of Automation and Computing, 2009, 6(1): 62-71 [2] Yao X, Wang X D, Zhang Y X, et al. Summary of Feature Selection Algorithms. Control and Decision, 2012, 27(2): 161-166 (in Chinese) (姚 旭,王晓丹,张玉玺,等.特征选择方法综述.控制与决策, 2012, 27(2): 161-166) [3] Saeys Y, Inza I, Larraaga P. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics, 2007, 23(19): 2507-2517 [4] Liu H W. A Study on Feature Selection Algorithms Using Information Entropy. Ph.D Dissertation. Changchun, China: Jilin University, 2010 (in Chinese) (刘华文.基于信息熵的特征选择算法研究.博士学位论文.长春:吉林大学, 2010) [5] Zheng Z H, Wu X Y, Srihari R. Feature Selection for Text Categorization on Imbalanced Data. ACM SIGKDD Explorations Newsle-tter, 2004, 6(1): 80-89 [6] Zhang C Y, Tian Z. Adaptive Kernel Feature Subspace Method for Efficient Feature Extraction. Pattern Recognition and Artificial Inte-lligence, 2013, 26(4): 392-401 (in Chinese) (张朝阳,田 铮.特征有效提取的自适应核特征子空间方法.模式识别与人工智能, 2013, 26(4): 392-401) [7] Xu Y, Li J T, Wang B, et al. A Category Resolve Power-Based Feature Selection Method. Journal of Software, 2008, 19(1): 82- 89 (in Chinese) (徐 燕,李锦涛,王 斌,等.基于区分类别能力的高性能特征选择方法.软件学报, 2008, 19(1): 82-89) [8] Liu H W, Liu L, Zhang H J. Feature Selection Using Mutual Information: An Experimental Study // Proc of the 10th Pacific Rim International Conference on Artificial Intelligence. Hanoi, Vietnam, 2008: 235-246 [9] Liang J Y, Li C W, Wei W. Advanced in Feature Selection Based on Rough Set. Journal of Shanxi University: Natural Science Edition, 2012, 35(2): 211-218 (in Chinese) (梁吉业,李超伟,魏 巍.基于Rough Sets的特征选择研究进展.山西大学学报:自然科学版, 2012, 35(2): 211-218) [10] Hsu W H. Genetic Wrappers for Feature Selection in Decision Tree Induction and Variable Ordering in Bayesian Network Structure Learning. Information Sciences, 2004, 163(1/2/3): 103-122 [11] Chiang L H, Pell R J. Genetic Algorithms Combined with Discri-minant Analysis for Key Variable Identification. Journal of Process Control, 2004, 14(2): 143-155 [12] Guyon I, Weston J, Barnhill S, et al. Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 2002, 46(1/2/3): 389-422 [13] Sánchez-Maroo N, Alonso-Betanzos A, Tombilla-Sanromán M. Filter Methods for Feature Selection-A Comparative Study // Proc of the 8th International Conference on Intelligent Data Engineering and Automated Learning. Birmingham, UK, 2007: 178-187 [14] Sun Y, Todorovic S, Goodison S. Local-Learning-Based Feature Selection for High-Dimensional Data Analysis. IEEE Trans on Pa-ttern Analysis and Machine Intelligence, 2009, 32(9): 1610-1626 [15] Zhang X, Chu S J, Xu M Z. Null Value Estimation Method Based on Information Granularity for Incomplete Information System. Journal of Chinese Computer Systems, 2011, 32(4): 752-756 (in Chinese) (张 霞,储尚军,许鸣珠.基于信息粒度的不完备信息系统空值补齐算法.小型微型计算机系统, 2011, 32(4): 752-756 ) [16] Liang J Y, Mi J R, Wei W, et al. An Accelerator for Attribute Reduction Based on Perspective of Objects and Attributes. Know-ledge-Based Systems, 2013, 44: 90-100 [17] Sun L, Xu J C, Li S Q, et al. New Approach for Feature Selection by Using Information Entropy. Journal of Information and Computational Science, 2011, 8(12): 2259-2268 [18] Xu J C, Sun L. Knowledge Entropy and Feature Selection in Incomplete Decision Systems. Applied Mathematics and Information Sciences, 2013, 7(2): 829-837