Rare Category Detection Algorithm Based on Cluster Separability
YAN Xuan-Hui, GUO Gong-De
School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007 Key Laboratory of Network Security and Cryptology of Fujian Province, Fujian Normal University, Fuzhou 350007
Abstract:The rare category mining, which is an important research field in data mining, is widely applied. Aiming at the defects of the traditional rare category recognition methods, an rare category detection algorithm based on cluster separability(RDACS), is proposed based on the combination of density difference and inter-cluster separability criterion for rare category mining. An active-learning scenario is used to detect rare category. The similarity of feature weight is applied to the separability of rare category cluster and its surrounding samples. The experimental results on UCI public datasets and KDD99 datasets show that compared with the existing similar algorithms, the RDACS algorithm has an advantage in the number of inquiries, which can significantly improve the efficiency and reduce human errors. RDACS is complementary to the existing rare category recognition methods.
[1] Pelleg D, Moore A. Active Learning for Anomaly and Rare-Category Detection // Saul L K, Weiss Y, Bottou L, eds. Advance in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2004, 17: 1073-1080 [2] He J R, Carbonell J. Nearest-Neighbor-Based Active Learning for Rare Category Detection // Platt J C, Koller D, Singer Y, et al., eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2007, 20: 633-640 [3] Agarwal R, Joshi M V. PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) [EB/OL].[2013-05-01].http://www.siam.org/meetings/sdm01/pdf/sdm01_30.pdf [4] Huang H, He Q M, Chen Q, et al. Rare Category Detection Algorithm Based on Weighted Boundary Degree. Journal of Software, 2012, 23(5): 1195-1206 (in Chinese) (黄 浩,何钦铭,陈 奇,等.基于加权边界度的稀有类检测算法.软件学报, 2012, 23(5): 1195-1206) [5] Vatturi P, Wong W K. Category Detection Using Hierarchical Mean Shift // Proc of the 15th ACM SIGKDD Conference on Knowledge and Data Mining. New York, USA: ACM Press, 2009: 847-856 [6] Huang H, He Q M, He J F, et al. RADAR: Rare Category Detection via Computation of Boundary Degree // Huang J Z, Cao L, Srivastava L, eds. Advances in Knowledge Discovery and Data Mi-ning. Berlin, Germany: Springer, 2011: 258-269 [7] Xue L X, Qiu B Z. Boundary Points Detection Algorithm Based on Coefficient of Variation.Pattern Recognition and Artificial Intelligence, 2009, 22(5): 799-802 (in Chinese) (薛丽香,邱保志.基于变异系数的边界点检测算法.模式识别与人工智能, 2009, 22(5): 799-802) [8] He J R, Carbonell J G. Prior-Free Rare Category Detection // Proc of the SIAM Data Mining Conference. Sparks, USA, 2009: 155-163 [9] Huang H, He Q M, Chiew K, et al. CLOVER: A Faster Prior-Free Approach to Rare-Category Detection. Knowledge and Information Systems, 2013, 35(3): 713-736 [10] Han J W, Kamber M, Pei J. Data Mining: Concepts and Techniques. 3rd Edition.San Francisco, USA: Morgan Kaufmann, 2012 [11] Wu J J, Xiong H, Wu P, et al. Local Decomposition for Rare Class Analysis[EB/OL]. [2013-03-01]. http://datamining.rutgers.edu/publication/COGKDD2007.pdf [12] Xia C Y, Hsu W, Lee M L, et al. BORDER: Efficient Computation of Boundary Points. IEEE Trans on Knowledge and Data Engineering, 2006, 18(3): 289-303 [13] Huang J Z, Ng M K, Rong H Q, et al. Automated Variable Weighting in K-means Type Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657-668 [14] Chen L F, Guo G D, Jiang Q S. Adaptive Algorithm for Soft Subspace Clustering. Journal of Software, 2010, 21(10): 2513-2523 (in Chinese) (陈黎飞,郭躬德,姜青山.自适应的软子空间聚类算法.软件学报, 2010, 21(10): 2513-2523) [15] Su X K. Study on Outliner Mining Algorithms Based on Clustering. Ph. D Dissertation. Shanghai, China: Donghua University, 2010 (in Chinese) (苏晓珂.基于聚类的异常挖掘算法研究.博士学位论文.上海:东华大学, 2010) [16] Qiu B Z, Yue F, Shen J Y. BRIM: An Efficient Boundary Points Detecting Algorithm // Zhou Z H, Li H, Yang Q, eds. Advances in Knowledge Discovery and Data Mining. Heidelberg, Germany: Springer-Verlag, 2007: 761-768 [17] DARPA Intrusion Detection Evaluation[DB/OL]. [2013-03-10]. http://www.ll.mit.edu/IST/ideval/index.html. [18] KDD-CUP-99 Task Description[DB/OL]. [2013-03-10]. https://kdd.ics.uci.edu/databases/kddcup99/task.html