Abstract:Ensemble method is a simple and effective method to deal with incomplete data for classification. However,the weight of each sub-classifier in ensemble classification algorithm for incomplete data is mainly determined by the size and dimension of corresponding sub-dataset at present. The contributions of the missing attributes are different,and information entropy is introduced to measure these differences,thus,a novel algorithm for incomplete data named Entropy Ensemble Classification Algorithm (EECA) is proposed in this paper. The ensemble classifier with BP neural network being base classifier is applied on UCI dataset. The experimental results show that EECA determining the weight for sub-classifier by information entropy is better than the algorithm by using simple weight.
[1] Allison P D. Missing data. Thousand Oaks,USA: Sage Publications,2001 [2] Luengo J,Sáez J A,Herrera F. Missing Data Imputation for Fuzzy Rule-Based Classification Systems. Soft Computing,2012,16(5): 863-881 [3] Xu Y M,Chen C,Xiong Y,et al. APT-KNN : An Efficient Missing Value Imputation Method Oriented toward Classification Issue. Computer Applications and Software,2011,28(4): 135-140 (in Chinese) (徐宇明,陈诚,熊赟,等. APT-KNN:一种面向分类问题的高效缺失值填充算法.计算机应用与软件,2011,28(4): 135-140) [4] Zhao F,Liu Q Z,Zhang Y,et al. Fill Absent Values in Massive Domain Data Stream. Journal of Nanjing University: Natural Sciences,2011,47(1): 32-39 (in Chinese) (赵飞,刘奇志,张 剡,等.一种大域数据流中缺失值的填充方法.南京大学学报:自然科学版,2011,47(1): 32-39) [5] Little R J A,Rubin D B. Statistical Analysis with Missing Data.New York,USA: John Wiley & Sons,1987 [6] Chen J N,Huang H K,Tian F Z,et al. Classification Method for Incomplete Data Based on Feature Selection. Computer Engineering and Applications,2007,43(31): 23-24,38 (in Chinese) (陈景年,黄厚宽,田凤占,等.一种基于特征选择的不完整数据分类方法.计算机工程与应用, 2007,43(31): 23-24) [7] Chen J N,Xu L. A Hybrid Selective Classifier for Categorizing Incomplete Data // Proc of the 6th International Conference on Fuzzy Systems and Knowledge Discovery. Tianjin,China,2009: 31-34 [8] Chen J N,Huang H K,Xu L,et al. Constructing Hybrid Selective Classifiers for Incomplete Data with Gain-Ratio. Journal of Beijing Jiaotong University,2009,33(5): 117-120 (in Chinese) (陈景年,黄厚宽,徐 力,等.利用增益率构建混合型选择性不完整数据分类器.北京交通大学学报,2009,33(5): 117-120) [9] Krause S,Polikar R. An Ensemble of Classifiers Approach for the Missing Feature Problem // Proc of the International Joint Conference on Neural Networks. Portland,USA,2003,I: 553-558 [10] Chen H X,Yuan S M,Jiang K. Wrapper Approach for Learning Neural Network Ensemble by Feature Selection // Proc of the 2nd International Symposium on Neural Networks. Chongqing,China,2005: 526-531 [11] Jiang K,Chen H X,Yuan S M. Classification for Incomplete Data Using Classifier Ensembles // Proc of the International Conference on Neural Networks and Brain. Beijing,China,2005: 559-563 [12] Chen H X,Du Y P,Jiang K. Classification of Incomplete Data Using Classifier Ensembles // Proc of the International Conference on Systems and Informatics. Yantai,China,2012: 2229-2232 [13] Cover T M,Thomas J A. Elements of Information Theory. New York,USA: Wiley,2012 [14] Estivill-Castro V. Why So Many Clustering Algorithms: A Position Paper. ACM SIGKDD Explorations Newsletter,2002,4(1): 65-75 [15] Dietterich T G. Machine Learning Research: Four Current Directions. AI Magazine,1997,18(4): 97-136 [16] Breiman L. Bagging Predicators. Machine Learning,1996,24(2): 123-140 [17] Freund Y,Schapire R E. A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. Journal of Computer and System Sciences,1997,55(1): 119-139