1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006 2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:Feature selection of functional data aims to choose those features slightly correlated and strongly representative, from the huge functional information. And it can simplify the calculation and improve the generalization ability. Traditional feature selection methods are directly applied in functional data, and the results are not effective or efficient. A functional data oriented fast feature selection(FFS) method integrating principal component analysis(PCA) and minimum convex hull is proposed in this paper. FFS can obtain stable subset of features fleetly. Considering the correlation embedding in features, the result of FFS can serve as initial feature subset of other iterative approaches. This means twice feature selection will be needed. As a popular feature selection method for functional data, conditional mutual information(CMI) is adopted. The experiment results on UCR datasets demonstrate the effectiveness of FFS, and a selection strategy under different demands of time cost or classification accuracy is given through the contrast experiments.
[1] RAMSAY J O. When the Data are Functions. Psychometrika, 1982, 47(4): 379-396. [2] RAMSAY J O, SILVERMAN B W. Applied Functional Data Analysis: Methods and Case Studies. New York, USA: Springer, 2002. [3] 王德青.函数型数据挖掘的统计分类方法研究.博士学位论文.厦门:厦门大学, 2014. (WANG D Q. Research on Statistical Classification Methods of Functional Data Mining. Ph.D Dissertation. Xiamen, China: Xiamen University, 2014) [4] 米子川,赵丽琴.函数型数据分析的研究进展和技术框架.统计与信息论坛, 2012, 27(6): 13-20. (MI Z C, ZHAO L Q. The Research Development and Technical Framework of Functional Data Analysis. Statistics & Information Forum, 2012, 27(6): 13-20.) [5] FLORINDO J B, BACKES A R, DE CASTRO M, et al. A Comparative Study on Multiscale Fractal Dimension Descriptors. Pattern Recognition Letters, 2012, 33(6): 798-806. [6] HSU W H. Genetic Wrappers for Feature Selection in Decision Tree Induction and Variable Ordering in Bayesian Network Structure Learning. Information Sciences, 2004, 163(1/2/3): 103-122. [7] DASH M, LIU H. Feature Selection for Classification. Intelligent Data Analysis, 1997, 1(1/2/3/4): 131-156. [8] SINDHWANI V, RAKSHIT S, DEODHARE D, et al. Feature Selection in MLPs and SVMs Based on Maximum Output Information. IEEE Transactions on Neural Networks, 2004, 15(4): 937-948. [9] HE R, TAN T N, WANG L, et al. L2,1 Regularized Correntropy for Robust Feature Selection // Proc of the IEEE International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 2504-2511. [10] DING C, PENG H C. Minimum Redundancy Feature Selection from Microarray Gene Expression Data // Proc of the IEEE Computer Society Conference on Bioinformatics. Washington, USA: IEEE, 2003: 523-528. [11] PENG H C, LONG F H, DING C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [12] 徐 燕,李锦涛,王 斌,等.基于区分类别能力的高性能特征选择方法.软件学报, 2008, 19(1): 82-89. (XU Y, LI J T, WANG B, et al, A Category Resolve Power-Based Feature Selection Method. Journal of Software, 2008, 19(1): 82-89.) [13] G MEZ-VERDEJO V, VERLEYSEN M, FLEURY J. Information-Theoretic Feature Selection for Functional Data Classification. Neurocomputing, 2009, 72(16/17/18): 3580-3589. [14] KRASKOV A, ST GBAUER H, GRASSBERGER P. Estimating Mutual Information. Physical Review E(Statistical Nonlinear & Soft Matter Physics), 2004, 69(2): 279-307. [15] BARBER C B, DOBKIN D P, HUHDANPAA H. The Quickhull Algorithm for Convex Hulls. ACM Transactions on Mathematical Software, 1996, 22(4): 469-483. [16] ROSSI F, LENDASSE A, FRANCOIS D, et al. Mutual Information for the Selection of Relevant Variables in Spectrometric Nonlinear Modelling. Chemometrics and Intelligent Laboratory Systems, 2006, 80(2): 215-226.
[17] KOZACHENKO L F, LEONENKO N N. A Statistical Estimate for the Entropy of a Random Vector. Problems Information Transmission, 1987, 23(2): 9-16.