Abstract:The existing online streaming feature selection algorithms usually select the optimal global feature subset, and it is assumed that this subset adapts to all regions of the sample space. However, each region of the sample space is characterized accurately by its own distinct feature subsets. The feature subsets are likely to be different in feature and size. Therefore, an algorithm of local online streaming feature selection based on max-decision boundary is proposed. The local feature selection is introduced. With the full usage of local information, feature measurement standards based on max-decision boundary are designed to separate samples of the same class from samples of different classes as far as possible. Meanwhile, three strategies, maximizing average decision boundary, maximizing decision boundary and minimizing redundancy, are employed to select appropriate features. The class similarity measurement method is applied after the optimal feature subset is selected for the local regions. Experimental results and statistical hypothesis tests on fourteen datasets demonstrate the effectiveness and stability of the proposed algorithm.
孙世明, 邓安生. 基于最大决策边界的局部在线流特征选择[J]. 模式识别与人工智能, 2021, 34(12): 1131-1142.
SUN Shiming, DENG Ansheng. Local Online Streaming Feature Selection Based on Max-Decision Boundary. , 2021, 34(12): 1131-1142.
[1] ARMANFARD N, REILLY J P, KOMEILI M.Logistic Localized Modeling of the Sample Space for Feature Selection and Classification. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(5): 1396-1413. [2] WANG Y X, JAN KLIJN J G, ZHANG Y, et al. Gene-Expression Profiles to Predict Distant Metastasis of Lymph-Node-Negative Primary Breast Cancer. The Lancet, 2005, 365(9460): 671-679. [3] URBANOWICZ R J, MELISSA M, CAVA W L, et al. Relief-Based Feature Selection: Introduction and Review. Journal of Biomedical Informatics, 2018, 85: 189-203. [4] GU Q Q, LI Z H, HAN J W.Generalized Fisher Score for Feature Selection // Proc of the 27th Conference on Uncertainty in Artificial Intelligence. Virginia, USA: AUAI Press, 2011: 266-273. [5] VERGARA J R, ESTÉVEZ P A. A Review of Feature Selection Methods Based on Mutual Information. Neural Computing and Applications, 2014, 24: 175-186. [6] WU X D, YU K, DING W, et al. Online Feature Selection with Streaming Features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(5): 1178-1192. [7] YU K, WU X D, DING W, et al. Scalable and Accurate Online Feature Selection for Big Data. ACM Transactions on Knowledge Discovery from Data, 2016, 11(2): 1-39. [8] WU X D, ZHU X Q, WU G Q, et al. Data Mining with Big Data. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(1): 97-107. [9] ZHOU J, FOSTER D P, STINE R A, et al. Streamwise Feature Selection. Journal of Machine Learning Research, 2006, 7: 1861-1885. [10] 尤殿龙,郭松,赵春慧,等.面向分类的流特征在线特征选择算法.电子学报, 2020, 48(2): 321-332. (YOU D L, GUO S, ZHAO C H, et al. Online Feature Selection with Streaming Features for Classification. Acta Electronica Sinica, 2020, 48(2): 321-332.) [11] 吴中华,郑玮.基于l2,1范数的在线流特征选择算法.计算机与数字工程, 2019, 47(6): 1306-1313. (WU Z H, ZHENG W.Online Streaming Feature Selection Algorithm Regularized by l2,1-norm. Computer and Digital Enginee-ring, 2019, 47(6): 1306-1313.) [12] ZHOU P, HU X G, LI P P, et al. Online Streaming Feature Selection Using Adapted Neighborhood Rough Set. Information Sciences, 2019, 481: 258-279. [13] ZHOU P, HU X G, LI P P, et al. OFS-Density: A Novel Online Streaming Feature Selection Method. Pattern Recognition, 2019, 86: 48-61. [14] 陈祥焰,林耀进,王晨曦.基于邻域粗糙集的高维类不平衡数据在线流特征选择.模式识别与人工智能, 2019, 32(8): 726-735. (CHEN X Y, LIN Y J, WANG C X.Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set. Pattern Recognition and Artificial Intelligence, 2019, 32(8): 726-735.) [15] 程玉胜,李雨,王一宾,等.结合滑动窗口与模糊互信息的多标记流特征选择.小型微型计算机系统, 2019, 40(2): 320-327. (CHENG Y S, LI Y, WANG Y B, et al. Multi-label Streaming Feature Selection Combining Sliding Window and Fuzzy Mutual Information. Journal of Chinese Computer Systems, 2019, 40(2): 320-327.) [16] ARMANFARD N, REILLY J P, KOMEILI M.Local Feature Selection for Data Classification. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 2015, 38(6): 1217-1227. [17] YANG K, CAI Z P, LI J Z, et al. A Stable Gene Selection in Microarray Data Analysis. BMC Bioinformatics, 2006, 7(1). DOI: 10.1186/1471-2105-7-228. [18] YU L, DING C, LOSCALZO S.Stable Feature Selection via Dense Feature Groups // Proc of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2008: 803-811. [19] GUYON I, SAFFARI A, DROR G, et al. Agnostic Learning vs. Prior Knowledge Challenge // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2007: 829-834. [20] YU K, DING W, WU X D.LOFS: A Library of Online Streaming Feature Selection. Knowledge-Based Systems, 2016, 113: 1-3. [21] ZHOU P.Online Streaming Feature Selection Algorithms Source Code[DB/OL]. [2021-05-15].https://github.com/doodzhou/OSFS. [22] BAGHERI A, SOFOTASIOS P C, TSIFTSIS T A, et al. Area under ROC Curve of Energy Detection over Generalized Fading Cha-nnels // Proc of the 26th IEEE Annual International Symposium on Personal, Indoor, and Mobile Radio Communications. Washington, USA: IEEE, 2015: 656-661. [23] DEMŠAR J. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 2006, 7: 1-30.