|
|
Optimal Individual Convergence Rate of Heavy-Ball Based Momentum Methods Based on Adaptive Step-Size Strategy |
HUANG Jianzhi1, LONG Sheng1, TAO Qing1 |
1. Department of Information Engineering, Chinese Academy of People's Liberation Army Artillery Air Defense Academy, Hefei 230031 terov Accelerated Adaptive Moment Estimation |
|
|
Abstract Optimization techniques, including adaptive step-size and momentum, are both utilized in AMSGrad. Compared with the adaptive step-size algorithms, there is one more logarithm factor in AMSGrad for convergence analysis. To solve the problem, the non-smooth convex problems are studied in this paper. By selecting the time-varying step-size and momentum parameters correctly, it is proved that the Heavy-ball-based momentum methods with adaptive step-size obtain the optimal individual convergence rate. It is indicated that the Heavy-ball-based momentum methods with adaptive step-size hold the advantages in both adaptation and acceleration. Hinge loss problem under L1-norm constraint is solved to verify the correctness of theoretical analysis.
|
Received: 10 October 2020
|
|
Corresponding Authors:
TAO Qing, Ph.D., professor. His research interests include pa-ttern recognition, machine learning and applied mathematics.
|
About author:: HUANG Jianzhi, master student. His research interests include convex optimization algorithm and its application in machine lear-ning.LONG Sheng, master student. His research interests include convex optimization algorithm and its application in machine lear-ning. |
|
|
|
[1] DUCHI J, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159. [2] SUN T, LI D S, QUAN Z, et al. Heavy-Ball Algorithms always Escape Saddle Points // Proc of the 28th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2019: 3520-3526. [3] KINGMA D P, BA J. Adam: A Method for Stochastic Optimization[C/OL]. [2020-09-29]. http://de.arxiv.org/pdf/1412.6980. [4] DOZAT T. Incorporating Nesterov Momentum into Adam[C/OL]. [2020-09-29]. http://cs229.stanford.edu/proj2015/054_report.pdf. [5] REDDI S J, KALE S, KUMAR S. On the Convergence of Adam and Beyond[C/OL]. [2020-09-29]. https://arxiv.org/abs/1904.09237. [6] ZOU F Y, SHEN L, JIE Z Q, et al. A Sufficient Condition for Convergences of Adam and RMSProp // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 11119-11127. [7] MUKKAMALA M C, HEIN M. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds // Proc of the 34th International Conference on Machine Learning. New York, USA: ACM, 2017: 2545-2553. [8] MA J, YARATS D. Quasi-Hyperbolic Momentum and Adam for Deep Learning // Proc of the 7th International Conference for Learning Representations[C/OL]. [2020-09-29]. https://arxiv.org/pdf/1810.06801.pdf. [9] POLYAK B T. Some Methods of Speeding up the Convergence of Iteration Methods. USSR Computational Mathematics and Mathematical Physics, 1964, 4(5): 1-17. [10] NESTEROV Y. A Method of Solving a Convex Programming Problem with Convergence Rate O(1/k2). Soviet Mathematics Doklady, 1983, 27(2): 372-376. [11] GHADIMI E, FEYZMAHDAVIAN H R, JOHANSSON M. Global Convergence of the Heavy-Ball Method for Convex Optimization // Proc of the European Control Conference. Washington, USA: IEEE, 2015: 310-315. [12] SHAMIR O, ZHANG T. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes // Proc of the 30th International Conference on Machine Learning. New York, USA: ACM, 2013: I-71-I-79. [13] 陶 蔚,潘志松,储德军,等.使用Nesterov步长策略投影次梯度方法的个体收敛性.计算机学报, 2018, 41(1): 164-176. (TAO W, PAN Z S, CHU D J, et al. The Individual Convergence of Projected Subgradient Methods Using the Nesterov's Step-Size Strategy. Chinese Journal of Computers, 2018, 41(1): 164-176.) [14] TAO W, PAN Z S, WU G W, et al. The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(7): 2557-2568. [15] 程禹嘉,陶 蔚,刘宇翔,等.Heavy-Ball型动量方法的最优个体收敛速率.计算机研究与发展, 2019, 56(8): 1686-1694. (CHENG Y J, TAO W, LIU Y X, et al. Optimal Individual Convergence Rate of the Heavy-Ball-Based Momentum Methods. Journal of Computer Research and Development, 2019, 56(8): 1686-1694.) [16] ALACAOGLU A, MALITSKY Y, MERTIKOPOULOS P, et al. A New Regret Analysis for Adam-Type Algorithms[C/OL]. [2020-09-29]. https://arxiv.org/pdf/2003.09729v1.pdf. [17] ZINKEVICH M. Online Convex Programming and Generalized Infinitesimal Gradient Ascent // Proc of the 20th International Conference on Machine Learning. New York, USA: ACM, 2003: 928-936. [18] AGARWAL A, BARTLETT P, RAVIKUMAR P, et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization. IEEE Transactions on Information Theory, 2012, 58(5): 3235-3249. [19] KESKAR N S, SOCHER R. Improving Generalization Performance by Switching from Adam to SGD[C/OL]. [2020-09-29]. https://arxiv.org/pdf/1712.07628.pdf. [20] SHALEV-SHWARTZ S, SINGER Y, SREBRO N, et al. Pegasos: Primal Estimated Sub-gradient Solver for SVM // Proc of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 807-814 [21] RAKHLINA, SHAMIR O, SRIDHARAN K. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization // Proc of the 29th International Conference on Machine Learning. New York, USA: ACM, 2012: 1571-1578. [22] TAO Q, WU G W, CHU D J. Improving Sparsity and Scalability in Regularized Nonconvex Truncated-Loss Learning Problems. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(7): 2782-2793. |
|
|
|