|
|
Feature Selection Algorithm Based on Joint Spectral Clustering and Neighborhood Mutual Information |
HU Minjie, ZHENG Liping, TANG Li, LIN Yaojin |
School of Computer Science, Minnan Normal University, Zhangzhou 363000 |
|
|
Abstract Aiming at some potential correlation between features in feature space, spectral clustering and neighborhood mutual information are exploited to explore the correlation features and obtain maximal relevant feature subset, respectively. And a feature selection algorithm combining spectral clustering and neighborhood mutual information is proposed. In this paper, the neighborhood mutual information is firstly applied to remove uncorrelated features, and then the spectral clustering is utilized to group features. The features of the same group are strongly correlated and the features of different groups are strongly different. Then, the feature subset strongly associated with class label is selected from each feature group. Finally, all selected feature subsets are collected together to form the final selected features. Extensive experiment is conducted with two different classifiers. Experimental results show that the proposed model effectively improves the classification performance with less features.
|
Received: 10 May 2017
|
|
Fund:Supported by National Natural Science Foundation of China(No.61303131), S&T Program of the Department of Education of Fujian Province(No.JAT170347,JAT1703501) |
About author:: (HU Minjie(Corresponding author), born in 1979, master, lecturer. Her research interests include feature selection.) (ZHENG Liping, born in 1977, master, lecturer. Her research interests include data mining.) (TANG Li, born in 1993, master student. Her research interests include data mining.) (LIN Yaojin, born in 1980, Ph.D., associate professor. His research interests include data mining and granular computing.) |
|
|
|
[1] LIN Y J, LI J J, LIN P R, et al. Feature Selection via Neighborhood Multi-granulation Fusion. Knowledge-Based Systems, 2014, 67: 162-168. [2] 冀素琴,石洪波,吕亚丽,等.基于粒化-融合的海量高维数据特征选择算法.模式识别与人工智能, 2016, 29(7): 590-597. (JI S Q, SHI H B, L Y L, et al. Feature Selection Algorithm Based on Granulation-Fusion for Massive High-Dimension Data. Pa-ttern Recognition and Artificial Intelligence, 2016, 29(7): 590-597.) [3] LIANG J Y, WANG F, DANG C Y, et al. An Efficient Rough Feature Selection Algorithm with a Multi-granulation View. International Journal of Approximate Reasoning, 2012, 53(6): 912-926. [4] DASH M, LIU H. Consistency-Based Search in Feature Selection. Artificial Intelligence, 2003, 151(1/2): 155-176. [5] 翟俊海,刘 博,张素芳.基于相对分类信息熵的进化特征选择算法.模式识别与人工智能, 2016, 29(8): 682-690. (ZHAI J H, LIU B, ZHANG S F. Feature Selection via Evolutio-nary Computation Based on Relative Classification Information Entropy. Pattern Recognition and Artificial Intelligence, 2016, 29(8): 682-690.) [6] ZHAO Z, LIU H. Spectral Feature Selection for Supervised and Unsupervised Learning // Proc of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 1151-1157. [7] PENG H C, LONG F H, DING C. Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [8] YU L, LIU H. Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research, 2004, 5: 1205-1224. [9] HU Q H, ZHANG L, ZHANG D, et al. Measuring Relevance between Discrete and Continuous Features Based on Neighborhood Mutual Information. Expert Systems with Applications, 2011, 38(9): 10737-10750. [10] LIU J H, LIN Y J, LIN M L, et al. Feature Selection Based on Quality of Information. Neurocomputing, 2017, 225: 11-22. [11] LIN Y J, HU Q H, LIU J H, et al. Multi-label Feature Selection Based on Neighborhood Mutual Information. Applied Soft Computing, 2016, 38: 244-256. [12] PENG C, KANG Z, YANG M, et al. Feature Selection Embedded Subspace Clustering. IEEE Signal Processing Letters, 2016, 23(7): 1018-1022. [13] WANG L X, JIANG S Y. Novel Feature Selection Method Based on Feature Clustering. Application Research of Computers, 2015, 32(5): 1305-1308. [14] ZHAO X, DENG W, SHI Y. Feature Selection with Attributes Clustering by Maximal Information Coefficient. Procedia Computer Science, 2013, 17: 70-79. [15] FIEDLER M. Algebraic Connectivity of Graphs. Czechoslovak Ma- thematical Journal, 1973, 23(2): 298-305. [16] MEILA M, XU L. Multiway Cuts and Spectral Clustering. Multivariate Analysis, 2003. DOI: 10.10.2/9781118650684.ch02. [17] NG A Y, JORDAN M I, WEIS Y. On Spectral Clustering: Ana-lysis and an Algorithm // DIETTERICH T G, BECKER S, GHAHRAMANI Z, eds. Advances in Neural Information Processing Systems 14. Cambridge, USA: The MIT Press, 2002: 849-856 [18] 于 剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围.中国科学(E辑), 2002, 32(2): 274-280. (YU J, CHENG Q S. Fuzzy Clustering Method in the Search the Optimal Clustering Number. Science in China(Series E), 2002, 32(2): 274-280). [19] HALL M A. Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning // Proc of the 17th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2000: 359-366. [20] 胡清华,于达仁,谢宗霞.基于邻域粒化和粗糙逼近的数值属性约简.软件学报, 2008, 19(3): 640-649. (HU Q H, YU D R, XIE Z X. Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation. Journal of Software, 2008, 19(3): 640-649.) [21] KONONENKO I. Estimation Attributes: Analysis and Extensions of RELIEF // Proc of the European Conference on Machine Learning. Berlin, Germany: Springer, 1994: 171-182. |
|
|
|