Abstract:Theoretical and experimental results indicate that for the ensemble classifiers with the same training error the one with higher margin distribution on training examples has better generalization performance. Therefore,the concept of margins of examples is introduced to ensemble pruning and it is employed to supervise the design of ensemble pruning methods. Based on the margins,a new metric called margin based metric (MBM) is designed to evaluate the importance of a classifier to an ensemble and an example set,and then a greedy ensemble pruning method called MBM based ensemble selection is proposed to reduce the ensemble size and improve its accuracy. The experimental results on 30 UCI datasets show that compared with other state of the art greedy ensemble pruning methods,the ensembles selected by the proposed method have better performance.
[1]Kuncheva L I. Combining Pattern Classifiers: Methods and Algorithms. New York,USA: John Wiley and Sons,2004 [2]Breiman L. Bagging Predictors. Machine Learning,1996,24(2): 123-140 [3]Freund Y,Schapire R F. A Decision Theoretic Generalization of On line Learning and an Application to Boosting. Journal of Computer and System Sciences,1997,55(1): 119-139 [4]Breiman L. Random Forests. Machine Learning,2001,45(1): 5-32 [5]Rodriguez J J,Kuncheva L I,Alonso C J. Rotation Forest: A New Classifier Ensemble Method. IEEE Trans on Pattern Analysis and Machine Intelligence,2006,28(10): 1619-1630 [6]Zhang Daoqiang,Chen Songcan,Zhou Zhihua,et al. Constraint Projections for Ensemble Learning // Proc of the 23rd AAAI Conference on Artificial Intelligence. Chicago,USA,2008: 758-763 [7]Partalas I,Tsoumakas G,Vlahavas I P. An Ensemble Pruning Primer \[EB/OL\]. \[2012-01-31\]. http://lpis. csd. auth. gr / publications / tsoumakas09. pdf [8]Caruana R,Niculescu Mizil A,Crew G,et al. Ensemble Selection from Libraries of Models // Proc of the 21st International Conference on Machine Learning. Banff,Canada,2004: 137-144 [9]Martinez Muverbnoz G,Suarez A. Aggregation Ordering in Bagging // Proc of the International Conference on Artificial Intelligence and Applications. Innsbruck,Austria,2004: 258-263 [10]Martinez Muverbnoz G,Suarez A. Pruning in Ordered Bagging Ensembles // Proc of the 23rd International Conference on Machine Learning. Pittsburgh,USA,2006: 609-616 [11]Lu Zhenyu,Wu Xindong,Zhu Xingquan,et al. Ensemble Pruning via Individual Contribution Ordering // Proc of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Washington,USA,2010: 871-880 [12]Banfield R E,Hall L O,Bowyer K W,et al. Ensemble Diversity Measures and Their Application to Thinning. Information Fusion,2005,6(1): 49-62 [13]Partalas I,Tsoumakas G,Vlahavas I P. Focused Ensemble Selection: A Diversity Based Method for Greedy Ensemble Selection // Proc of the 18th European Conference on Artificial Intelligence. Patras,Greece,2008: 117-121 [14]Partalas I,Tsoumakas G,Vlahavas I P. An Ensemble Uncertainty Aware Measure for Directed Hill Climbing Ensemble Pruning. Machine Learning,2010,81(3): 257-282 [15]Schapire R E,Freund Y,Bartlett P,et al. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. The Annals of Statistics,1998,26(5): 1651-1686 [16]Garcia Pddrajas N,Garcia Osorio C,Fyfe C. Nonlinear Boosting Projections for Ensemble Construction. Journal of Machine Learning Research,2007,8(1): 1-33 [17]Quinlan J R. C4.5: Programs for Machine Learning. New York,USA: Morgan Kaufmann,1993 [18]Witten I H,Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition. New York,USA: Morgan Kaufmann,2005 [19]Demsar J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research,2006,7(1): 1-30