Abstract:In the problem of imbalanced data classification, the minority class is the classification target, but it is more difficult to be recognized than the majority class. The current popular classification algorithms have two main disadvantages: the explicit setup of instances importance degrees and the indirect support of the recognition of minority class. An instance importance based learning algorithm is proposed, namely instance importance based support vector machine (IISVM). IISVM is composed of three phases. In the first two phases, one class SVM and binary SVM are used respectively. And the training instances are divided into three groups: the most important group, important group and unimportant group. In the last phase, the most important instances are employed to train the initial classifier, and then the explicit stopping criteria are adopted to control the recognition of minority class directly. The experimental results illustrate that the performance of IISVM is superior to other standard or advanced solutions.
[1] Phua C, Alahakoon D, Lee V. Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 50-59 [2] Zheng Zhaohui, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization [EB/OL]. [2003-08-24]. http://www.site.uottwa.ca/~nat/Workshop2003/zheng.pdf [3] Ertekin S, Huang Jian, Bottou L, et al. Learning on the Border:Active Learning in Imbalanced Data Classification [EB/OL]. [2007-11-08]. http://www.personal.psu.edu/juh177/pubs/CIKM2007.pdf [4] Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One Sided Selection // Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 179-186 [5] Barandela R, Valdovinos R M, Sanchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling // Proc of the Joint IAPR International Workshops on Structural, Syntactic and Statistical Pattern Recognition. Lisbon, Portugal, 2004: 806-814 [6] Chawla N V, Hall L O, Bowyer K W, et al. Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16: 321-357 [7] Han Hui, Wang Wenyuan, Mao Binghua. Borderline Smote: A New Over-Sampling Method in Imbalanced Data Sets Learning // Proc of the International Conference on Intelligent Computing. Hefei, China, 2005: 878-887 [8] Jo T, Japkowicz N. Class Imbalances versus Small Disjuncts. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 40-49 [9] Hulse J V, Khoshgoftaar T M, Napolitano A. Experimental Perspectives on Learning from Imbalanced Data // Proc of the 24th International Conference on Machine Learning. Corvallis, USA, 2007: 935-942 [10] Geibel P, Wysotzki F. Perceptron Based Learning with Example Dependent and Noisy Costs. // Proc of the International Conference on Machine Learning. Washington, USA, 2003: 218-225 [11] Chan P K, Stolfo S J. Toward Scalable Learning with Non-Uniform Class and Cost Distributions // Proc of the 4th International Conference on Knowledge Discovery and Data Mining. New York, USA, 1998: 164-168 [12] Sun Yanmin, Kamel M S, Wang Yang. Boosting for Learning Multiple Classes with Imbalanced Class Distribution // Proc of the 6th International Conference on Data Mining. Hongkong, China, 2006: 592-602 [13] Raskutti B, Kowalczyk A. Extreme Rebalancing for SVMs: A Case Study. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 60-69 [14] Bordes A, Ertekin S, Weston J, et al. Fast Kernel Classifiers with Online and Active Learning. Journal of Machine Learning Research, 2005, 6: 1579-1619 [15] Wu Junjie, Xiong Hui, Wu Peng, et al. Local Decomposition for Rare Class Analysis // Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 814-823 [16] Weng C G, Josiah P. A Data Complexity Analysis on Imbalanced Datasets and an Alternative Imbalance Recovering Strategy // Proc of the IEEE/WIC/ACM International Conference on Web Intelligence. Hongkong, China, 2006: 270-277 [17] Weiss G M. Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 7-19 [18] Wu Gang, Chang E Y. Class-Boundary Alignment for Imbalanced Dataset Learning [EB/OL]. [2003-08-24]. http://www.site.uottawa.ca/~nat/Workshop2003/Wu-final.pdf