Low-Bit Quantization of Neural Network Based on Exponential Moving Average Knowledge Distillation
LÜ Junhuan1,2, XU Ke1,2, WANG Dong1,2
1. Institute of Information Science, Beijing Jiaotong University, Beijing 100044; 2. Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing 100044
Abstract:Now the memory and computational cost restrict the popularization of deep neural network application, whereas neural network quantization is an effective compression method. As the number of quantized bits is lower, the classification accuracy of neural networks becomes poorer in low-bit quantization of neural networks. To solve this problem, a low-bit quantization method of neural networks based on knowledge distillation is proposed. Firstly, a few images are exploited for adaptive initialization to train the quantization step of activation and weight to speed up the convergence of the quantization network. Then, the idea of exponential moving average knowledge distillation is introduced to normalize distillation loss and task loss and guide the training of quantization network. Experiments on ImageNet and CIFAR-10 datasets show that the performance of the proposed method is close to or better than that of the full precision network.
[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks // Proc of the 25th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2012: 1097-1105. [2] HINTON G, DENG L, YU D, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. [3] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [4] HAN S, POOL J, TRAN J, et al. Learning Both Weights and Connections for Efficient Neural Networks // Proc of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2015, I: 1135-1143. [5] WANG P S, HU Q H, ZHANG Y F, et al. Two-Step Quantization for Low-Bit Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4376-4384. [6] HINTON G, VINYALS O, DEAN J.Distilling the Knowledge in a Neural Network[C/OL]. [2021-03-25].https://export.arxiv.org/pdf/1503.02531. [7] JACOB B, KLIGYS S, CHEN B, et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2018: 2704-2713. [8] CHOI J, WANG Z, VENKARANI S, et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks[C/OL].[2021-03-25]. https://arxiv.org/pdf/1805.06085v2.pdf. [9] JUNG S, SON C, LEE S, et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4345-4354. [10] ESSER S K, MCKINSTRY J L, BLANI D, et al. Learned Step Size Quantization[C/OL].[2021-03-25]. https://arxiv.org/pdf/1902.08153v1.pdf. [11] JAIN S R, GURAL A, WU M, et al. Trained Quantization Thre-sholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks[C/OL].[2021-03-25]. https://arxiv.org/pdf/1903.08066v2.pdf. [12] ZHOU A J, YAO A B, GUO Y W, et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights[C/OL].[2021-03-25]. https://arxiv.org/pdf/1702.03044v2.pdf. [13] ZHANG D Q, YANG J L, YE D Q Z, et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 373-390. [14] GAO M Y, SHEN Y J, LI Q Q, et al. Residual Knowledge Disti-llation[C/OL].[2021-03-25]. https://arxiv.org/pdf/2002.09168v1.pdf. [15] NOWAK T S, CORSO J J.Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression[C/OL]. [2021-03-25].https://arxiv.org/pdf/1801.04651.pdf. [16] POLINO A, PASCANU R, ALISTARH D.Model Compression via Distillation and Quantization[C/OL]. [2021-03-25].https://arxiv.org/pdf/1802.05668.pdf. [17] WEI Y, PAN X Y, QIN H W, et al. Quantization Mimic: Towards Very Tiny CNN for Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 274-290. [18] MISHRA A, MARR D.Apprentice: Using Knowledge Distillation Techniques to Improve Low-Precision Network Accuracy[C/OL]. [2021-03-25].https://arxiv.org/pdf/1711.05852.pdf. [19] BENGIO Y, LÉONARD N, COURVILLE A. Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation[C/OL].[2021-03-25]. https://arxiv.org/pdf/1308.3432v1.pdf. [20] KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images[C/OL]. [2021-03-25]. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf [21] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115(3): 211-252. [22] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[C/OL].[2021-03-25]. https://arxiv.org/pdf/1704.04861.pdf. [23] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4510-4520. [24] REDMON J, FARHADI A.YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [25] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [26] KINGMA D P, BA J.Adam: A Method for Stochastic Optimization[C/OL]. [2021-03-25]. https://arxiv.org/pdf/1412.6980.pdf. [27] GONG R H, LIU X L, JIANG S H, et al. Differentiable Soft Quan-tization: Bridging Full-Precision and Low-Bit Neural Networks // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 4851-4860. [28] BHALGAT Y, LEE J, NAGEL M, et al. LSQ+: Improving Low-Bit Quantization through Learnable Offsets and Better Initialization // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition Workshops. Washington, USA: IEEE, 2020: 2978-2985. [29] WANG K, LIU Z J, LIN Y J, et al. HAQ: Hardware-Aware Automated Quantization with Mixed Precision // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 8604-8612.