Abstract:Feature selection is a hot topic in current information science, especially in the field of pattern recognition. In this paper, feature selection algorithms are classified from different points of view. Several embranchments of feature selection and the development situation are introduced. Some difficulties in the theoretic analysis and application are involved. From a practicality angle, using support vector machine to select features is considered as the research direction in machine learning.
[1] de Sa Marques J P. Pattern Recognition Concepts, Methods and Applications. Berlin, Germany: SpringerVerlag, 2002 [2] Ganeshanandam S, Krzanowski W J. On Selecting Variables and Assessing Their Performance in Linear Discriminant Analysis. Australian Journal of Statistics, 1989, 31(3): 433447 [3] Bian Zhaoqi, Zhang Xuegong. Pattern Recognition. 2nd Edition. Beijing, China: Tsinghua University Press, 2000 (in Chinese) (边肇祺,张学工.模式识别.第2版.北京:清华大学出版社, 2000) [4] Theodoridis S, Koutroumbas K. Pattern Recognition. 2nd Edition. New York, USA: Elsevier, 2003 [5] Dougherty E R. Small Sample Issues for MicroarrayBased Classification. Comparative and Functional Genomics, 2001, 2(1): 2834 [6] Dougherty E R, Shmulevich I, Bittner M L. Genomic Signal Processing: The Salient Issues. EURASIP Journal on Applied Signal Processing, 2004, 4(1): 146153 [7] Kim S, Dougherty E R, Barrera J, et al. Strong Feature Sets from Small Samples. Journal of Computational Biology, 2002, 9(1): 127146 [8] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, USA: SpringerVerlag, 2001 [9] Webb R A. Statistical Pattern Recognition. New York, USA: John Wiley & Son, 2002 [10] Dudoit S, Fridlyand J, Speed T P. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association, 2002, 97(457): 7787 [11] Adam B L, Vlahou A, Semmes O J, et al. Proteomic Approaches to Biomarker Discovery in Prostate and Bladder Cancers. Proteomics, 2001, 1(10): 12641270 [12] Sun Z H, Bebis G, Miller R. Object Detection Using Feature Subset Selection. Pattern Recognition, 2004, 37(11): 21652176 [13] Jain A K, Duin R D W, Mao J C. Statistical Pattern Recognition: A Review. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22(1): 437 [14] Kudo M, Sklansky J. Comparison of Algorithms That Select Features for Pattern Classifiers. Pattern Recognition, 2000, 33(1): 2541 [15] Chen Xuewen. An Improved Branch and Bound Algorithm for Feature Selection. Pattern Recognition Letters, 2003, 24(12): 19251933 [16] Fukunaga K, Narendra P M. A Branch and Bound Algorithm for Computing kNearest Neighbors. IEEE Trans on Computers, 1975, 24(7): 750753 [17] Hamamoto Y, Uchimura S, Matsuura Y, et al. Evaluation of the Branch and Bound Algorithm for Feature Selection. Pattern Recognition Letters, 1990, 11(7): 453456 [18] Wang Ling. Intelligent Optimization Algorithms with Applications. Beijing, China: Tsinghua University Press, 2004 (in Chinese) (王 凌.智能优化算法及其应用.北京:清华大学出版社, 2004) [19] Tsymbal A, Puuronen S. Ensemble Feature Selection with the Simple Bayesian Classification. Information Fusion, 2003, 4(2): 87100 [20] Wu B L, Abbott T, Fishman D,et al. Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data. Bioinformatics, 2003, 19(13): 16361643 [21] Yang J, Honavar V. Feature Subset Selection Using a Genetic Algorithm. IEEE Intelligent Systems, 1998, 13(2): 4449 [22] Chiang L H, Pell R J. Genetic Algorithms Combined with Discriminant Analysis for Key Variable Identification. Journal of Process Control, 2004, 14(2): 143155 [23] Siedlecki W, Sklansky J. A Note on Genetic Algorithms for Large Scale Feature Selection. Pattern Recognition Letters, 1989, 10(11): 335347 [24] Peng Sihua, Xu Qianghua, Ling Xuefeng. Molecular Classification of Cancer Types from Microarray Data Using the Combination of Genetic Algorithms and Support Vector Machines. FEBS Letters, 2003, 555(2): 358362 [25] Mao K Z. Fast Orthogonal Forward Selection Algorithm for Feature Subset Selection. IEEE Trans on Neural Networks, 2002, 13(5): 12181224 [26] Furlanello C, Serafini M, Merler S, et al. An Accelerated Procedure for Recursive Feature Ranking on Microarray Data. Neural Networks, 2003, 16(5/6): 641648 [27] Somol P, Pudil P, Novoviov J, et al. Adaptive Floating Search Methods in Feature Selection. Pattern Recognition Letters, 1999, 20(11/12/13): 11571163 [28] Pudil P, Novovicova J, Kittler J. Floating Search Methods in Feature Selection. Pattern Recognition Letters, 1994, 15(11): 11191125 [29] Inza I, Larranaga P, Blanco R, et al. Filter Versus Wrapper Gene Selection Approaches in DNA Microarray Domains. Artificial Intelligence in Medicine, 2004, 31(2): 91103 [30] Zhou Xiaobo, Wang Xiaodong, Dougherty E R. NonlinearProbit Gene Classification Using MutualInformation and WaveletBased Feature Selection. Biological Systems, 2004, 12(3): 371386 [31] Furey T S, Cristianini N, Duffy N, et al. Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics, 2000, 16(10): 906914 [32] Zhou Xiaobo, Wang Xiaodong, Dougherty E R. Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. Journal of New Mathematics and Natural Computation, 2005, 1(1): 129145 [33] Zhou Xiaobo, Wang Xiaodong, Dougherty E R. Construction of Genomic Networks Using Mutual Information Clustering and ReversibleJump Markov Chain Monte Carlo Predictor Design. Signal Processing, 2003, 83(4): 745761 [34] Tabus I, Astola J. On the Use of MDL Principle in Gene Expression Prediction. EURASIP Journal of Applied Signal Processing, 2001, 4: 297303 [35] Liu Huiqing, Li Jinyan, Wong L. A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome Informatics, 2002, 13: 5160 [36] Michael M, Lin W C. Experimental Study of Information Measure and InterIntra Class Distance Ratios on Feature Selection and Orderings. IEEE Trans on System, Man, and Cybernetics, 1973, 3(2): 172181 [37] Sindhwani V, Rakshit S, Deodhare D, et al. Feature Selection in MLPs and SVMs Based on Maximum Output Information. IEEE Trans on Neural Networks, 2004, 15(4): 937948 [38] Haering N, Lobo N D V. Feature and Classification Methods to Locate Deciduous Trees in Images. Computer Vision and Image Understanding, 1999, 75(1/2): 133149 [39] Hsu W H. Genetic Wrappers for Feature Selection in Decision Tree Induction and Variable Ordering in Bayesian Network Structure Learning. Information Sciences, 2004, 163(1/2/3): 103122 [40] Li L, Weinberg C R, Darden T A, et al. Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformatics, 2001, 17(12): 11311142 [41] Shima K, Todoriki M, Suzuki A. SVMBased Feature Selection of Latent Semantic Features. Pattern Recognition Letters, 2004, 25(9): 10511057 [42] Jack L B, Nandi A K. Fault Detection Using Support Vector Machines and Artificial Neural Networks, Augmented by Genetic Algorithms. Mechanical Systems and Signal Processing, 2002, 16(2/3): 373390 [43] Verikas A, Bacauskiene M. Feature Selection with Neural Networks. Pattern Recognition Letters, 2002, 23(11): 13231335 [44] Xiong Momiao, Fang Xiangzhong, Zhao Jinying. Biomarker Identification by Feature Wrappers. Genome Research, 2001, 11(11): 18781887 [45] Guyon I, Weston J, Barnhill S, et al. Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 2002, 46(1/2/3): 389422 [46] Weston J, Mukherjee S, Chapelle O, et al. Feature Selection for SVMs // Solla S A, Leen T K, Muller K R, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001, 13: 668674 [47] Perner P, Apte C. Empirical Evaluation of Feature Subset Selection Based on a RealWorld Data Set // Proc of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. London, UK: SpringerVerlag, 2000: 575580 [48] Zhang Xuegong, Wong W H. Recursive Sample Classification and Gene Selection Based on SVM: Method and Software Description. Technical Report, Boston, USA: Harvard School of Public Health. Department of Biostatistics, 2001 [49] Brown M P S, Grundy W N, Lin D, et al. KnowledgeBased Analysis of Microarray Gene Expression Data by Using Support Vector Machines. Proc of the National Academy of Science, 2000, 97(1): 262267 [50] Barzilay O, Brailovsky V L. On Domain Knowledge and Feature Selection Using a Support Vector Machine. Pattern Recognition Letters, 1999, 20(5): 475484 [51] Fortuna J, Capson D. Improved Support Vector Classification Using PCA and ICA Feature Space Modification. Pattern Recognition, 2004, 37(6): 11171129 [52] Simek K, Fujarewicz K, Swierniak A, et al. Using SVD and SVM Methods for Selection, Classification, Clustering and Modeling of DNA Microarray Data. Engineering Applications of Artificial Intelligence, 2004, 17(4): 417427 [53] Fujarewicz K, Wiench M. Selecting Differentially Expressed Genes for Colon Tumor Classification. International Journal of Applied Mathematics and Computer Science, 2003, 13(3): 327335 [54] Mao Yong, Pi Daoying, Yu Ming, et al. Accelerated Recursive Feature Elimination by Support Vector Machine for Key Variable Identification. Chinese Journal of Chemical Engineering, 2006, 14(1): 6572 [55] Mao Yong, Zhou Xiaobo, Pi Daoying, et al. Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm. Journal of Zhejiang University: Science B, 2005, 6(10): 961973 [56] Li Fan, Yang Yiming. Using Recursive Classification to Discover Predictive Features // Proc of the ACM Symposium on Applied Computing. Santa Fe, New Mexico, 2005: 10541058 [57] Mao Yong, Zhou Xiaobo, Yin Zheng, et al. Gene Selection Using Recursive Feature Elimination Based on Gaussian Kernel Support Vector Machine with Adaptive Kernel Width Strategy // Proc of the 1st International Conference on Rough Sets and Knowledge Technology. Chongqing, China, 2006: 799806