Label Noise Filtering via Perception of Nearest Neighbors
JIANG Gaoxia1, FAN Ruixuan1, WANG Wenjian1,2
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:Label noise filtering algorithms based on k nearest neighbor are sensitive to the neighbor parameter k. Aiming at this problem, a label noise filtering algorithm based on perception of nearest neighbors(PNN) is proposed to solve the problem of intra-class label noise in binary classification datasets effectively. Positive and negative samples are considered separately in PNN, and thus the label noise detection problem in classification is transformed into two outlier detection problems with single-class data. Firstly, the personalized neighbor parameter is determined automatically by the neighbor perception strategy to avoid the sensitivity of neighbor parameter. Secondly, all samples are divided into core samples and non-core samples by noise factor. The non-core samples are taken as the candidates of label noise. Finally, the noise is identified and filtered by combining the label information of the nearest neighbors of the candidate samples. Experiments indicate that the proposed algorithm performs well in noise filtering and classification prediction.
[1] WANG R X, LIU T L, TAO D C. Multiclass Learning with Partially Corrupted Labels. IEEE Transactions on Neural Networks and Learning System, 2018, 29(6): 2568-2580. [2] HAN B, TSANG I W, CHEN L, et al. Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels. IEEE Transactions on Neural Networks and Learning System, 2019, 30(12): 3774-3787. [3] YUAN W W, GUAN D H, MA T, et al. Classification with Class Noises through Probabilistic Sampling. Information Fusion, 2018, 41: 57-67. [4] SEGATA N, BLANZIERI E, DELANY S J, et al. Noise Reduction for Instance-Based Learning with a Local Maximal Margin Approach. Journal of Intelligent Information Systems, 2010, 35(2): 301-331. [5] FRÉNAY B, VERLEYSEN M. Classification in the Presence of Label Noise: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845-869. [6] DENIZCAN V N, SAYIN M O, MOHAMMADREZA M N, et al. Nonlinear Regression via Incremental Decision Trees. Pattern Re-cognition, 2019, 86: 1-13. [7] TENG C M. Dealing with Data Corruption in Remote Sensing // Proc of the 6th International Symposium on Intelligent Data Analysis. Berlin, Germany: Springer, 2005: 452-463. [8] LUENGO J, SHIM S O, ALSHOMRANI S, et al. CNC-NOS: Class Noise Cleaning by Ensemble Filtering and Noise Scoring. Know-ledge-Based Systems, 2018, 140: 27-49. [9] GAMBERGER D, LAVRAČ N, GROSˇELJ C. Experiments with Noise Filtering in a Medical Domain // Proc of the International Conference on Machine Learning. Berlin, Germany: Springer, 1999: 143-151. [10] SÁEZ J A, GALAR M, LUENGO J, et al. INFFC: An Iterative Class Noise Filter Based on the Fusion of Classifiers with Noise Sensitivity Control. Information Fusion, 2016, 27: 19-32. [11] DOMINGUES R, FILIPPONE M, MICHIARDI P, et al. A Comparative Evaluation of Outlier Detection Algorithms: Experiments and Analyses. Pattern Recognition, 2018, 74: 406-421. [12] SLUBAN B, GAMBERGER D, LAVRACˇ N. Ensemble-Based Noise Detection: Noise Ranking and Visual Performance Evaluation. Data Mining and Knowledge Discovery, 2014, 28(2): 265-303. [13] BRODLEY C E, FRIEDL M A. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research, 1999, 11: 131-167. [14] GARCIA L P F, LORENA A C, MATWIN S, et al. Ensembles of Label Noise Filters: A Ranking Approach. Data Mining and Knowledge Discovery, 2016, 30(5): 1192-1216. [15] SLUBAN B, GAMBERGER D, LAVRA N. Advances in Class Noise Detection // Proc of the 19th European Conference on Artificial Intelligence. Amsterdam, the Netherland: IOS Press, 2010: 1105-1106. [16] GARCIA L P F, LEHMANN J, DE CARVALHO A, et al. New Label Noise Injection Methods for the Evaluation of Noise Filters. Knowledge-Based Systems, 2019, 163: 693-704. [17] GARCIA L P F, DE CARVALHO A, LORENA A C. Effect of Label Noise in the Complexity of Classification Problems. Neurocomputing, 2015, 160: 108-119. [18] WILSON D L. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on Systems, Man, and Cybernetics, 1972, 2(3): 408-421. [19] BARANDELA R, GASCA E. Decontamination of Training Samples for Supervised Pattern Recognition Methods // Proc of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition(SSPR) . Berlin, Germany: Springer, 2000: 621-630. [20] LIU H W, ZHANG S C. Noisy Data Elimination Using Mutual k-Nearest Neighbor for Classification Mining. Journal of Systems and Software, 2012, 85(5): 1067-1074. [21] 樊瑞宣,姜高霞,王文剑.一种个性化k近邻的离群点检测算法.小型微型计算机系统, 2020, 41(4): 752-757. (FAN R X, JIANG G X, WANG W J. Outlier Detection Algorithm with Personalized k-Nearest Neighbor. Journal of Chinese Computer Systems, 2020, 41(4): 752-757). [22] ALCALÁ-FDEZ J, FERNNDEZ A, LUENGO J, et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing, 2011,17: 255-287.