Abstract:Support vector machines can be posed as quadratic programming problems in various ways. Using the technology of the Lagrangian dual, an unconstrained differentiable convex program problem is proposed as the dual of the quadratic programming, which is a simple reformulation on standard quadratic program of a linear support vector machine. The resulting problem minimizes a differentiable convex piecewise quadratic function in the input space, but not in the feature space. By the characteristic of the piecewise quadratic function and the combination with the speedy line search method, a Conjugate Gradients Support Vector Machine (CGSVM) is proposed to solve the unconstraint program problem quickly. After kernel matrix is factorized by the Cholesky factorization or incomplete Cholesky factorization, nonlinear classification problem with kernel function can also be solved by CGSVM with little increase of the complexity of the algorithms. CGSVM can be used to solve linear classification problems with millions of points and the nonlinear classification problems with three thousand points or more on normal PC. Many numerical experiments and complexity analysis demonstrate that the proposed algorithms are very efficient compared with the similar algorithms such as ASVM and LSVM.
[1] Vapnik V N. The Nature of Statistical Learning Theory. Berlin, Germany: Springer-Verlag, 1995 [2] Vapnik V N. An Overview of Statistical Learning Theory. IEEE Trans on Neural Networks, 1999, 10(5): 988-999 [3] Mangasarian O L. Generalized Support Vector Machines. In: Smola A, Bartlett P, Schlkopf B, Schuurmans D, eds. Advances in Large Margin Classifiers. Cambridge, USA: MIT Press, 2000, 135-146 [4] Lee Y J, Mangasarian O L. SSVM: A Smooth Support Vector Machine. Computational Optimization and Applications, 2001, 20(1): 5-22 [5] Joachims T. Making Large-Scale SVM Learning Practical. In: Schlkopf B, et al, eds. Advances in Kernel Method-Support Vector Learning, Cambridge, USA: MIT Press, 1999, 169-184 [6] Mangasarian O L, Musicant D R. Active Set Support Vector Machine Classification. In: Lee T K, Dietterich T G, Tresp V, eds. Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001, 577-583 [7] Mangasarian O L, Musicant D R. Lagrangian Support Vector Machines. Journal of Machine Learning Research, 2001, 1(3): 161-177 [8] Lin C J, Saigal R. An Incomplete Cholesky Factorization for Dense Matrices. BIT, 2000, 40(13): 536-558 [9] Zhang Y. Solving Large-Scale Linear Programs by Interior-Point Methods under the MATLAB Environment. Optimization Methods and Software, 1998, 10(1): 1-31 [10] Joachims T. SVMlight.1998. http://svmlight.joachims.org/ [11] Platt J. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schlkopf B, et al. eds. Advances in Kernel Method-Support Vector Learning. Cambridge, USA: MIT Press, 1999, 185-208 [12] Mangasarian O L, Musicant D R. Successive Overrelaxation for Support Vector Machines. IEEE Trans on Neural Networks, 1999,10(5): 1032-1037 [13] Zhou S S, Rong X F, Zhou L H. A Maximum Entropy Method for Training the Support Vector Machines. Signal Processing, 2003, 19(6): 595-599 (in Chinese) (周水生, 容晓锋, 周利华. 训练支撑向量机的极大熵方法. 信号处理, 2003, 19(6): 595-599) [14] Zhou S S, Zhou L H. Lower Dimension Newton- Algorithm for Training the Support Vector Machines. Systems Engineering and Electronics, 2004, 26(9): 1315-1318 (in Chinese) (周水生, 周利华. 训练支撑向量机的低维Newton算法. 系统工程与电子技术, 2004, 26(9): 1315-1318) [15] Mangasarian O L. Nonlinear Programming. New York, USA: McGraw-Hill, 1969 [16] Bazaraa M S, Sherali H D, Shetty C M. Nonlinear Programming, Theory and Algorithms. New York, USA: John Wiley and Sons, 1993 [17] Golub G H, van Loan C F. Matrix Computations. 3rd Edition. Baltimore, USA: The John Hopkins University Press, 1996 [18] Lee Y J, Mangasarian O L. RSVM: Reduced Support Vector Machines. In: Proc of the 1st SIAM International Conference on Data Mining. Chicago, USA, 2001. ftp://ftp.cs.wisc.edu/pub/dmi/techreports/00-07.ps [19] Yuan Y X, Sun W Y. Optimization Theory and Method . Beijing, China: Science Press, 2003 (in Chinese) (袁亚湘, 孙文瑜. 最优化理论与方法. 北京: 科学出版社, 2003) [20] ILOG CPLEX 6.5 Reference Manual. 1999. http://www.rpi.edu/dept/math/math-programming/cplex66/sun4x_56/doc/refman/onlinedoc/ [21] Murphy P M, Aha D W. UCI Repository of Machine Learning Databases. 1992. http://www.ics.uci.edu/~mlearn/ MLRepository.html [22] Musicant D R, Managsarian O L. LSVM: Lagrangian Support Vector Machine, 2000. http://www.cs.wisc. edu/dmi/svm/ [23] Musicant D R. NDC: Normally Distributed Clustered Datasets. 1998. http://www.cs.wisc.edu/~musicant/data /ndc [24] Ho T K, Kleinberg E M. Checkerboard Dataset. 1996. http://www.cs.wisc.edu/math-prog/mpml.html