Feature Selection Algorithm Based on Label Correlation
LÜ Yuejiao1, LI Deyu1,2
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:In multi-label classification, each sample can be associated with multiple label classes at one time and some of them are related to each other. The classification performance is optimized by taking full advantage of these label correlations. Therefore, frequent itemsets are employed to mine the correlation between labels, and an improved multi-label feature selection algorithm is proposed for the multi-label attribute reduction algorithm based on neighborhood rough set. Then, the samples are further clustered and grouped according to the similarity of the features, and attribute reduction and classification are performed based on the label correlations in local samples. Finally, the effectiveness of the proposed algorithm is verified by experiments on 5 multi-label datasets.
[1] ZHANG M L, ZHOU Z H. A Review on Multi-label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. [2] CHEN G B, YE D H, XING Z C, et al. Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-label Text Categorization // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2017: 2377-2383. [3] XIA S, CHEN P, ZHANG J, et al. Utilization of Rotation-Invariant Uniform LBP Histogram Distribution and Statistics of Connected Regions in Automatic Image Annotation Based on Multi-label Learning. Neurocomputing, 2017, 228: 11-18. [4] WANG X, SUKTHANKAR G. Multi-label Relational Neighbor Cla-ssification Using Social Context Features // Proc of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2013: 464-472. [5] ZHANG M L, WU L. Lift: Multi-label Learning with Label-Specific Features. IEEE Transactions on Pattern Analysis and Machine Inte-lligence, 2015, 37(1): 1609-1614. [6] 段 洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法.计算机研究与发展, 2015, 52(1): 56-65. (DUAN J, HU Q H, ZHANG L J, et al. Feature Selection for Multi-label Classification Based on Neighborhood Rough Set. Journal of Computer Research and Development, 2015, 52(1): 56-65.) [7] LIN Y J, HU Q H, LIU J H, et al. Multi-label Feature Selection Based on Max-Dependency and Min-Redundancy. Neurocomputing, 2015, 168: 92-103. [8] LIU J H, LIN Y J, LI Y W, et al. Online Multi-label Streaming Feature Selection Based on Neighborhood Rough Set. Pattern Recognition, 2018, 84: 273-287. [9] BOUTELL M R, LUO J B, SHEN X P, et al. Learning Multi-label Scene Classification. Pattern Recognition, 2004, 37(9): 1757-1771. [10] ELISSEEFF A, WESTON J. A Kernel Method for Multi-labelled Classification // DIETTERICH T G, BECKER S, GHAHRAMANI Z, eds. Advances in Neural Information Processing Systems 14. Cambridge, USA: The MIT Press, 2002: 681-687. [11] TSOUMAKAS G, VLAHAVAS I. Random k-labelsets: An Ensemble Method for Multi-label Classification // Proc of the 18th European Conference on Machine Learning. Berlin, Germany: Sprin-ger, 2007: 406-417. [12] HUANG S J, ZHOU Z H. Multi-label Learning by Exploiting Label Correlations Locally // Proc of the 26th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2012: 949-955. [13] 蒋 芸,肖 潇,侯金泉,等.基于局部正、负标记相关性的k近邻多标记分类新算法.计算机工程与科学, 2019, 41(10): 1854-1860. (JIANG Y, XIAO X, HOU J Q, et al. A New KNN Multi-label Classification Algorithm Based on Local Positive and Negative Labeling Correlation. Computer Engineering and Science, 2019, 41(10): 1854-1860.) [14] 王国胤,姚一豫,于 洪.粗糙集理论与应用研究综述.计算机学报, 2009, 32(7): 1229-1246. (WANG G Y, YAO Y Y, YU H. A Survey on Rough Set Theory and Application. Chinese Journal of Computers, 2009, 32(7): 1229-1246.) [15] DENG Z X, ZHENG Z L, DENG D Y, et al. Feature Selection for Multi-label Learning Based on F-Neighborhood Rough Sets. IEEE Access, 2020, 8: 39678-39688. [16] ZHANG C, LI D Y, LIANG J Y. Multi-granularity Three-Way Decisions with Adjustable Hesitant Fuzzy Linguistic Multigranulation Decision-Theoretic Rough Sets over Two Universes. Information Sciences, 2020, 507: 665-683. [17] ZHANG C, LI D Y, LIANG J Y. Hesitant Fuzzy Linguistic Rough Set over Two Universes Model and Its Applications. International Journal of Machine Learning and Cybernetics, 2018, 9(4): 577-588. [18] AGRAWAL R, SRIKANT R. Fast Algorithms for Mining Association Rules in Large Databases // Proc of the 20th International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufmann, 1994: 487-499. [19] ZHANG M L, ZHOU Z H. ML-KNN: A Lazy Learning Approach to Multi-label Learning. Pattern Recognition, 2007, 40(7): 2038-2048.