Abstract:The feature relevance measures employed by current feature selection methods can effectively evaluate the relevance between two features,but they do not consider the influence of the other features on them. On the premise of considering feature interaction overall,sparse representation coefficient is proposed as a feature relevance measure. The difference between the proposed method and the existing relevance measures is that it reveals the relevance between feature and target under the influence of the other features,which reflects feature interaction. In order to verify the effectiveness of sparse representation coefficient to measure relevance of feature,the classification performance is compared among feature subsets selected by Relief F and the feature selection methods using sparse coefficient,symmetrical uncertainty and Pearson correlation coefficient as relevance measures respectively. The experimental results show that the classification performance of the features selected by the proposed method is higher and more stable.
[1] Saari P,Eerola T,Lartillot O. Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music. IEEE Trans on Audio and Speech Processing,2010,19(6): 1802-1812 [2] Yang Feng,Mao K Z. Robust Feature Selection for Microarray Data Based on Multicriterion Fusion. IEEE Trans on Computational Biology and Bioinformatics,2011,8(4): 1080-1092 [3] Zhou Nina,Wang Lipo. Effective Selection of Informative SNPs and Classification on the HapMap Genotype Data. BMC Bioinformatics,2007,8: 484 [4] Guyon I. Elisseeff A. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research,2003,3(3): 1157-1182 [5] Saeys Y,Inza I,Larranaga P. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics,2007,23(19): 2507-2517 [6] Jeffery I B,Higgins D G,Culhane A C. Comparison and Evaluation of Methods for Generating Differentially Expressed Gene Lists from Microarray Data. BMC Bioinformatics,2006,7: 359 [7] Ding C,Peng H. Minimum Redundancy Feature Selection from Microarray Gene Expression Data.Journal of Bioinformatics and Computational Biology,2003,3(2): 185-205 [8] Cai Ruichu,Hao Zhifeng,Yang Xiaowei,et al. A New Hybrid Method for Gene Selection. Pattern Analysis Applications,2011,14(1): 1-8 [9] Mamitsuka H. Selecting Features in Microarray Classification Using ROC Curves. Pattern Recognition,2006,39(12): 2393-2404 [10] Kannan S,Ramaraj N. A Novel Hybrid Feature Selection via Symmetrical Uncertainty Ranking Based Local Memetic Search Algorithm. Knowledge-Based Systems,2010,23(6): 580-585 [11] Liu Danhua. Study in Theory and Application of Compressed Sensing. Ph.D Dissertation. Xian,China: Xidian University,2009(in Chinese) (刘丹华.信号稀疏分解及压缩感知理论应用研究.博士学位论文.西安:西安电子科技大学,2009) [12] Mallat S G,Zhang Z F. Matching Pursuits with Time-frequency Dictionaries. IEEE Trans on Signal Processing,1993,41(12): 3397-3415 [13] Chen S. Basis Pursuit. Ph.D Dissertation. Stanford,USA: Stanford University,1995 [14] Daubechies I. Ten Lectures on Wavelets. Philadelphia: SIAM: Society for Industrial and Applied Mathematics,1992 [15] Chen S,Donoho D L,Saunders M A. Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing,1999,20(1): 33-61 [16] Andersen E D,Andersen K D. The MOSEK Optimization Tools Version 2.5. [EB/OL]. [2002-10-20]. www.mosek.com [17] Candès E,Romberg J. l1-magic: A Collection of MATLAB Routines for Solving the Convex Optimization Programs Central to Compressive Sampling. [EB/OL]. [2006-2-10]. www.acm.caltech.edu/l1magic/ [18] Friedlander M,van den Berg E. SPGL1,a Solver for Large Scale Sparse Reconstruction. SIAM Journal on Scientific Computing,2008,31(2): 890-912 [19] Hang X. Gene Selection Using L1-norm Least Square Regression // Proc of the World Congress on Computer Science and Information Engineering,Los Angels,USA,2009: 38-41 [20] Hang X. Multiclass Gene Selection on Microarray Data Using L1-norm Least Square Regression // Proc of the International Joint Conference on Bioinformatics,Systems Biology and Intelligent Computing. Shanghai,China,2009: 52-55 [21] Tsang I W,Kocsor A,Kwok J T. Large-Scale Maximum Margin Discriminant Analysis Using Core Vector Machines. IEEE Trans on Neural Networks,2008,19(4): 610-624 [22] Forrest S. Genetic Algorithms: Principles of Natural Selection Applied to Computation. Science,1993,261(5123): 872-878 [23] Yu Lei,Liu Huan. Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research,2004,5(10): 1205-1224 [24] Robnik-Sikonja M,Kononenko I. Theoretical and Empirical Analysis of Relief and Relief F. Machine Learning,2003,53(1/2): 23-69 [25] Singh D,Febbo P G,Ross K,et al. Gene Expression Correlates of Clinical Prostate Cancer Behavior. Cancer Cell,2002,1(2): 203-209 [26] Golub T R,Slonim D K,Tamayo P,et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science,1999,286(5439): 531-537 [27] GordonG J,Jensen R V,Hsiao L L,et al. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research,2002,62(17): 4963-4967 [28] Alizadeh A A,Eisen M B,Davis R E,et al. Distinct Types of Diffuse Large b-Cell Lymphoma Identified by Gene Expression Profiling. Nature,2000,403(6769): 503-511 [29] Khan J,Wei J S,Ringner M,et al. Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine,2001,7(6): 673-679 [30] Pomeroy S L,Tamayo P,Gaasenbeek M,et al. Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression. Nature,2002,415(6870): 436-442 [31] Li Tao,Zhang Chengliang,Ogihara M. A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression. Bioinformatics 2004,20(15): 2429-2437