Abstract:Using the strategy of reducing the variance in smooth stochastic method can effectively improve the convergence of the algorithm. An algorithm, hybrid regularized mirror descent with reduced variance and weighted average (α-HRMDVR-W), is obtained by using weighted average and reduced variance for solving “L1+ L2 + Hinge” non-smooth strong convex optimization problem. The variance reduction strategies are utilized at each step of the iterative process, and the weighted average of the output mode is used. It is proved that the α-HRMDVR-W has optimal convergence rate and the convergence rate does not depend on the number of samples. Unlike the existing variance reduction methods, α-HRMDVR-W only uses a small portion of samples instead of the total samples to modify the gradient at each iteration. Experimental results show that α-HRMDVR-W reduces the variance and decreases CPU time.
[1] TSENG P. Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Optimization. Mathematical Progra-mming (Series B), 2010, 125(2): 263-295. [2] SHALEV-SHWARTZ S, SINGER Y, SREBRO N, et al. Pegasos: Primal Estimated Sub-gradient Solver for SVM. Mathematical Programming (Series B), 2011, 127(1): 3-30. [3] ROBBINS H, MONRO S. A Stochastic Approximation Method. The Annuals of Mathematical Statistics, 1951, 22(3): 400-407. [4] KIEFER J, WOLFOWITZ J. Stochastic Estimation of the Maximum of a Regression Function. The Annuals of Mathematical Statistics, 1952, 23(3): 462-466. [5] BACH F, MOULINES E. Non-asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning // SHAWE-TAYLOR J, ZEMEL R S, BARTLETT P L, et al., eds. Advances in Neural Information Processing Systems 24. Cambridge, USA: MIT Press, 2011: 451-459. [6] NEMIROVSKI A, JUDITSKY A, LAN G H, et al. Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, 2009, 19(4): 1574-1609. [7] ZHANG T. Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms // Proc of the 21st International Conference on Machine Learning. New York, USA: ACM, 2004: 116. [8] HAZAN E, KALE S. Beyond the Regret Minimization Barrier: Optimal Algorithms for Stochastic Strongly-Convex Optimization. Journal of Machine Learning Research, 2014, 15(1): 2489-2512. [9] RAKHLIN A, SHAMIR O, SRIDHARAN K. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization // Proc of the 29th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 2012: 449-456. [10] LACOSTE-JULIEN S, SCHMIDT M, BACH F. A Simpler Approach to Obtaining an O(1/t) Convergence Rate for Projected Stochastic Subgradient Descent [J/OL].[2016-02-05].http://arxiv.org/pdf/1212.2002v2.pdf. [11] 邵言剑,陶 卿,姜纪远,等.一种求解强凸优化问题的最优随机算法.软件学报, 2014, 25(9): 2160-2171. (SHAO Y J, TAO Q, JIANG J Y, et al. Stochastic Algorithm with Optimal Convergence Rate for Strongly Convex Optimization Pro-blems. Journal of Software, 2014, 25(9): 2160-2171.) [12] DUCHI J, SHALEV-SHWARTZ S, SINGER Y, et al. Composite Objective Mirror Descent [C/OL]. [2016-02-05]. http://web.stanford.edu/~jduchi/projects/DuchiShSiTe10.pdf. [13] CHEN X, LIN Q H, PEA J. Optimal Regularized Dual Ave-raging Methods for Stochastic Optimization // PEREIRA F, BURGES C J C, BOTTOU L, et al., eds. Advances in Neural Information Processing Systems 25. Cambridge, USA: MIT Press, 2012: 395-403. [14] GHADIMI S, LAN G H. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, I: A Generic Algorithmic Framework. SIAM Journal on Optimization, 2012, 22(4): 1469-1492. [15] GHADIMI S, LAN G H. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms. SIAM Journal on Optimization, 2013, 23(4): 2061-2089. [16] JOHNSON R, ZHANG T. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction // BURGES C J C,BOTTOU L, WELLING M, et al., eds. Advances in Neural Information Processing Systems 26. Cambridge, USA: MIT Press, 2013: 315-323. [17] XIAO L, ZHANG T. A Proximal Stochastic Gradient Method with Progressive Variance Reduction. SIAM Journal on Optimization, 2014, 24(4): 2057-2075. [18] NITANDA A. Stochastic Proximal Gradient Descent with Acceleration Techniques // GHAHRAMANI Z, WELLING M, CORTES C, et al., eds. Advances in Neural Information Processing Systems 27. Cambridge, USA: MIT Press, 2014: 1574-1582. [19] ALLEN-ZHU Z, YUAN Y. UniVR: A Universal Variance Reduction Framework for Proximal Stochastic Gradient Method [J/OL].[2016-02-05].http://arxiv.org/pdf/1506.01972v1.pdf. [20] XIAO L. Dual Averaging Methods for Regularized Stochastic Lear-ning and Online Optimization. Journal of Machine Learning Research, 2010, 11: 2543-2596. [21] 朱小辉,陶 卿,邵言剑,等.一种减小方差求解非光滑问题的随机优化算法.软件学报, 2015, 26(11): 2752-2761. (ZHU X H, TAO Q, SHAO Y J, et al. Stochastic Optimization Algorithm with Variance Reduction for Solving Non-smooth Pro-blems. Journal of Software, 2015, 26(11): 2752-2761.) [22] BREGMAN L M. The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 1967, 7(3): 200-217. [23] FAN R E, CHANG K W, HSIEH C J, et al. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 2008, 9(9): 1871-1874. [24] SHALEV-SHWARTZ S, ZHANG T. Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization. Mathematical Programming, 2016, 155(1): 105-145. [25] ZHONG W L, KWOK J T. Fast Stochastic Alternating Direction Method of Multipliers [C/OL].[2016-02-05]. http:// jmlr.org/proceedings/papers/v32/zhong14.pdf. [26] ZHONG L W , KWOK J T. Accelerated Stochastic Gradient Me-thod for Composite Regularization [C/OL].[2016-02-05]. http://jmlr.csail.mit.edu/proceedings/papers/v33/zhong14.pdf.