Balance Method for Imbalanced Support Vector Machines
LIU WanLi1,2, LIU SanYang1, XUE ZhenXia1,3
1.Department of Applied Mathematics, Xidian University, Xi'an 7100712. Department of Mathematics, Luoyang Normal College, Luoyang 4710223. Department of Mathematics, Henan Science and Technology University, Luoyang 471003
Abstract:An adjustment method is proposed for the separation hyperplane of binaryclassification imbalanced data. Firstly, the original samples are preliminarily trained by the standard support vector machines, and a normal vector of the separation hyperplane is obtained. Secondly, onedimensional data are generated by projecting the high dimensional data onto the normal vector. Then, the ratio of the twoclass penalty factors is determined based on the information derived from the standard deviation of the projective data and the twoclass sample sizes. Finally, a new separation hyperplane is presented by the second training. Experimental results show the efficiency, i.e., the two error ratios can be balanced and even be decreased generally.
[1] Vapnik V N. The Nature of Statistical Learning Theory. New York, USA: SpringerVerlag, 1995 [2] Japkowicz N, Stephen S. The Class Imbalanced Problem: A Systematic Study. Intelligent Data Analysis, 2002, 6(5): 429449 [3] Chawla N V, Bowyer K W, Hall L O, et al. Synthetic Minority OverSampling Technique. Journal of Artificial Intelligence Research, 2002, 16(3): 321357 [4] Kubat M, Matwin S. Addressing the Curse of Imbalanced Datasets: OneSided Sampling // Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 178186 [5] Rehan A, Stephen K, Nathalie J. Applying Support Vector Machines to Imbalanced Datasets // Proc of the 15th European Conference on Machines Learning. Pisa, Italy, 2004: 3950 [6] Barandela R, Valdovinos R M, Snchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling? // Proc of the Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition. Lisbon, Portugal, 2004: 806814 [7] Lin Y, Lee Y, Wahba G. Support Vector Machines for Classification in Nonstandard Situations. Machine Learning, 2002, 46(1/2/3): 191202 [8] Barandela R, Sánchez J S, Garcia V, et al. Strategies for Learning in Class Imbalance Problems. Pattern Recognition, 2003, 36(3): 849851 [9] Zheng Enhui, Li Ping, Song Zhihuan. Mining Knowledge from Unbalanced Data: Effect of Class Distribution on SVM Classification. Information and Control, 2005, 34(6): 703708 (in Chinese) (郑恩辉,李 平,宋执环.不平衡数据挖掘:类分布对支持向量机的影响.信息与控制, 2005, 34(6): 703708) [10] Tao Qing, Wu Gaowei, Wang Feiyue, et al. Posterior Probability Support Vector Machines for Unbalanced Data. IEEE Trans on Neural Networks, 2005, 16(6): 15611573 [11] Lin Chunfu, Wang Shengde. Fuzzy Support Vector Machines. IEEE Trans on Neural Networks, 2002, 13(2): 464471 [12] Huang Kaizhu, Yang Haiqin, King I, et al. The Minimum Error Minimax Probability Machine. Journal of Machine Learning Research, 2004, 5(10): 12531286