Abstract:The stochastic gradient descent method may converge to a local optimum. Aiming at this problem, a stochastic gradient descent method of convolutional neural network using fractional-order momentum is proposed to improve recognition accuracy and learning convergence rate of convolution neural networks. By combining the traditional momentum-based stochastic gradient descent method with fractional-order difference method, the parameter updating method is improved. The influence of fractional-order on the training result of network parameters is discussed, and an order adjustment method is produced. The validity of the proposed parameters training method is verified and analyzed on MNIST dataset and CIFAR-10 dataset. The experimental results show that the proposed method improves the recognition accuracy and learning convergence rate of convolutional neural networks.
[1] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1989, 1(4): 541-551. [2] 王亚南,苏剑波.基于图像合成的多姿态人脸图像识别方法.模式识别与人工智能, 2015, 28(9): 848-856. (WANG Y N, SU J B. Multipose Face Image Recognition Based on Image Synthesis.Pattern Recognition and Artificial Intelligence, 2015, 28(9): 848-856.) [3] 张焯林,赵建伟,曹飞龙.构建带空洞卷积的深度神经网络重建高分辨率图像.模式识别与人工智能, 2019, 32(3): 259-267. (ZHANG Z L, ZHAO J W, CAO F L. Building Deep Neural Networks with Dilated Convolutions to Reconstruct High-Resolution Image. Pattern Recognition and Artificial Intelligence, 2019, 32(3): 259-267.) [4] TIELEMAN T, HINTON G. Lecture 6.5-rmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 2012, 4(2): 26-30. [5] DUCHI J, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159. [6] ZEILER M D. ADADELTA: An Adaptive Learning Rate Method[C/OL]. [2020-01-27]. https://arxiv.org/pdf/1212.5701.pdf. [7] DIEDERIK P K, JIMMY B. ADAM: A Method for Stochastic Optimization[C/OL]. [2020-01-27]. https://arxiv.org/pdf/1412.6980.pdf [8] REDDI S J, KALE S, KUMAR S. On the Convergence of Adam and Beyond[C/OL]. [2020-01-27]. https://arxiv.org/pdf/1904.09237.pdf. [9] QIAN N. On the Momentum Term in Gradient Descent Learning Algorithms. Neural Networks, 1999, 12(1): 145-151. [10] SHENG D, WEI Y H, CHEN Y Q, et al. Convolutional Neural Networks with Fractional Order Gradient Method[C/OL]. [2020-01-27]. https://arxiv.org/pdf/1905.05336.pdf. [11] BAO C H, PU Y F, ZHANG Y. Fractional-Order Deep Backpro-pagation Neural Network. Computational Intelligence and Neuroscience, 2018. DOI: 10.1155/2018/7361628. [12] PU Y F, WANG J. Fractional-Order Backpropagation Neural Networks: Modified Fractional-Order Steepest Descent Method for Family of Backpropagation Neural Networks[J/OL]. [2020-01-27]. https://arxiv.org/ftp/arxiv/papers/1906/1906.09524.pdf. [13] YANG Y, HE Y, WANG Y, et al. Stability Analysis of Fractio-nal-Order Neural Networks: An LMI Approach. Neurocomputing, 2018, 285: 82-93. [14] KASLIK K, SIVASUNDARAM S. Dynamics of Fractional-Order Neural Networks // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2011: 611-618. [15] ROSTAMI F, JAFARIAN A. A New Artificial Neural Network Structure for Solving High-Order Linear Fractional Differential Equations. International Journal of Computer Mathematics, 2018, 95(3): 528-539. [16] WANG J, YANG G L, ZHANG B J, et al. Convergence Analysis of Caputo-Type Fractional Order Complex-Valued Neural Networks. IEEE Access, 2017, 5: 14560-14571. [17] ZHAO D Z, LUO M K. Representations of Acting Processes and Memory Effects: General Fractional Derivative and Its Application to Theory of Heat Conduction with Finite Wave Speeds. Applied Mathematics and Computation, 2019, 346: 531-544. [18] 薛定宇.分数阶微积分学与分数阶控制.北京:科学出版社, 2018. (XUE D Y. Fractional Calculus and Fractional-Order Control. Beijing, China: Science Press, 2018.) [19] WANG A L, WANG Y, CHEN Y S. Hyper Spectral Image Classification Based on Convolutional Neural Network and Random Fo-rest. Remote Sensing Letters, 2019, 10(11): 1086-1094. [20] SIEROCIUK D, DZIELINSKI A, Fractional Kalman Filter Algorithm for the States, Parameters and Order of Fractional System Estimation. International Journal of Applied Mathematics and Computer Science, 2006, 1(16): 129-140. [21] WANG L N, YANG Y, MIN R Q, et al. Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent. Neural Networks, 2017, 93: 219-229. [22] ZAMORA E, SOSSA H. Dendrite Morphological Neurons Trained by Stochastic Gradient Descent // Proc of the IEEE Symposium Series on Computational Intelligence. Washington, USA: IEEE, 2017. DOI: 10.1109/SSCI.2016.7849933.