Optimized Cutting Plane Method for Linear SVM via Inexact Step-Length Search
CHU De-Jun1, TAO An2, GAO Qian-Kun3, JIANG Ji-Yuan3, TAO Qing1
1 The 11th Department, Army Officer Academy of PLA, Hefei 230031 2Training Department, Army Officer Academy of PLA, Hefei 230031 3 Administrant Brigade of Postgraduate, Army Officer Academy of PLA, Hefei 230031
Abstract:Cutting plane method efficiently solves the primal problem of linear support vector machines by adding cutting planes incrementally, and thus it can be accelerated through the exact line search. In this paper, an optimized cutting plane method with inexact line search is presented, and it determines the interval containing the optimal step size with fewer iterations. The acceptable step size is obtained by the closed-form solution of quadratic interpolation with two points. The theoretical analysis shows that the proposed method has the same optimal convergence bound as the exact line search method with a higher speed and low cost. The experiments on large-scale datasets demonstrate that the proposed method outperforms the exact line search method. In some cases, it achieves even more than 50% speedup.
[1] Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000 [2] Yuan G X, Ho C H, Lin C J. Recent Advances of Large-Scale Linear Classification. Proceedings of the IEEE, 2012, 100(9): 2584-2603 [3] Yu J, Vishwanathan S V N, Günter S, et al. A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning. Journal of Machine Learning Research, 2010, 11: 1145-1200 [4] Duchi J, Shalev-Shwartz S, Singer Y, et al. Composite Objective Mirror Descent[EB/OL]. [2013-05-03]. http://dept.stat.lsa.umich.edu/~tewaria/research/duchi10composite.pdf [5] Zhang T. Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms[EB/OL].[2013-03-20]. http://stat.rutgers.edu/home/tzhang/papers/icmlo4-stograd.pdf [6] Shalev-Shwartz S, Singer Y, Srebro N, et al. Pegasos: Primal Estimated Sub-Gradient Solver for SVM. Mathematical Programming, 2011, 127(1): 3-30 [7] Bottou L, Bousquet O. The Tradeoffs of Large Scale Learning[EB/OL]. [2007-12-03]. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2007_726.pdf [8] Nesterov Y. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems. SIAM Journal on Optimization. 2010, 22(2): 341-362 [9] Hsieh C J, Chang K W, Lin C J, et al. A Dual Coordinate Descent Method for Large-Scale Linear SVM // Proc of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008: 408-415 [10] Joachims T. Training Linear SVMs in Linear Time // Proc of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA, 2006: 217-226 [11] Teo C H, Smola A, Vishwanathan S V N, et al. A Scalable Modular Convex Solver for Regularized Risk Minimization // Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 727-736 [12] Lan G H. Bundle-Type Methods Uniformly Optimal for Smooth and Non-smooth Convex Optimization[EB/OL]. [2010-12-07]. http://www.ise.ufl.edu/glan/files/2011/12/optProxLevel Simplified3.pdf [13] Franc V, Sonnenburg S. Optimized Cutting Plane Algorithm for Support Vector Machines // Proc of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008: 320-327 [14] Tao Q, Kong K, Chu D J, et al. Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses // Proc of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Bristol, UK, 2012: 537-552 [15] Duda R O, Hart P E. Pattern Classification and Scene Analysis. 2nd Edition. New York, USA: Wiley, 2000 [16] J E Kelly Jr. The Cutting-Plane Method for Solving Convex Problems. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(4): 703-712 [17] Boyd S, Vandenberghe L. Convex Optimization. Cambridge, UK: Cambridge University Press, 2004 [18] Nocedal J, Wright S. Numerical Optimization. New York, USA: Springer-Verlag, 1999 [19] Yuan Y X, Sun W Y. Optimization Theory and Methods. Beijing, China: Science Press, 1997 (in Chinese) (袁亚湘,孙文瑜.最优化理论与方法.北京:科学出版社, 1997) [20] Tao Q, Chu D J, Wang J. Recursive Support Vector Machines for Dimensionality Reduction. IEEE Trans on Neural Networks, 2008, 19(1): 189-193 [21] Tao Q, Sun Z Y, Kong K. Developing Learning Algorithms via Optimized Discretization of Continuous Dynamical Systems. IEEE Trans on Systems, Man and Cybernetics: Part B, 2012, 42(1): 140-149