Abstract:The performance of existing methods for bookmark spam detection is decreased when there is less user profile information. An ensemble SVM approach integrated with confidence for detecting bookmark spam is proposed to solve this problem. The Bootstrap technology is firstly used to repeatedly sample the training data so as to get the subset of training samples for individual SVM. Then, sigmoid function is use to transform the standard output of SVM into a posterior probability which is used as the confidence of categories output. Finally, a method integrated with the confidence is proposed to aggregate the output of individual SVM, which is better than voting strategy. The experimental results show that the detection performance of the proposed approach outperforms the existing methods in the case of less user profile information.
[1] Koutrika G,Effendi F A,Gyngyi Z,et al.Combating Spam in Tagging Systems // Proc of the 3rd International Workshop on Adversarial Information Retrieval on the Web.Banff,Canada,2007: 57-64 [2] Heymann P,Koutrika G,Garcia-Molina H.Fighting Spam on Social Web Sites-A Survey of Approaches and Future Challenges.IEEE Internet Computing,2007,11(6): 36-45 [3] Krause B,Schmitz C,Hotho A,et al.The Anti-Social Tagger-Detecting Spam in Social Bookmarking Systems // Proc of the 4th International Workshop on Adversarial Information Retrieval on the Web.Beijing,China,2008: 61-68 [4] Kyriakopoulou A,Kalamboukis T.Combining Clustering with Classification for Spam Detection in Social Bookmarking Systems // Proc of the International Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.Antwerp,Belgium,2008: 47-54 [5] Markines B,Cattuto C,Menczer F.Social Spam Detection // Proc of the 5th International Workshop on Adversarial Information Retrieval on the Web.Madrid,Spain,2009: 41-48 [6] Gkanogiannis A,Kalamboukis T.A Novel Supervised Learning Algorithm and Its Use for Spam Detection // Proc of the International Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.Antwerp,Belgium,2008: 13-20 [7] Lin H J,Yeh J P.Optimal Reduction of Solutions for Support Vector Machines.Applied Mathematics and Computation,2009,214(2): 329-335 [8] Rokach L.Ensemble-Based Classifiers.Artificial Intelligence Review,2010,33(1/2): 1-39 [9] Hanczar B,Nadif M.Using the Bagging Approach for Biclustering of Gene Expression Data.Neurocomputing,2001,74(10): 1595-1605 [10] Acevedo F J,Maldonado S,Domínguez E,et al.Probabilistic Support Vector Machines for Multi-Class Alcohol Identification.Sensors and Actuators B: Chemical,2007,122(1): 227-235 [11] Lin H T,Lin C J,Weng R C.A Note on Platts Probabilistic Outputs for Support Vector Machines.Machine Learning,2007,68(3): 267-276 [12] Zeng Xiaodong,Chao S,Wong F.Optimization of Bagging Classifiers Based on SBCB Algorithm // Proc of the 9th International Conference on Machine Learning and Cybernetics.Qingdao,China,2010: 262-267 [13] Han Guang,Zhao Chunxia.AUC Maximization Linear Classifier Based on Active Learning and Its Application.Neurocomputing,2010,73(7/8/9): 1272-1280