Abstract:Due to the difficulty of using objective criteria to remove redundant units in deep neural networks, pruned networks often exhibit a sharp decline in performance. To address this issue, an unstructured pruning method based on network architecture search(UPNAS) is proposed. Firstly, a mask learning module is defined in the search space to remove the redundant weight parameters. Then, layer-wise relevance propagation is introduced, and a layer-wise relevance score to each network weight is assigned during the backward propagation process to measure the contribution of each weight to the network output and assist in the update of binary mask parameters. Finally, a unified update is performed on the network weights, architecture parameters and layer-wise relevance scores. Experiment on CIFAR-10 and ImageNet classification datasets shows that UPNAS can maintain the generalization ability of the network in high pruning rate scenarios and meet the requirements for model deployment.
[1] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2023-2-23]. https://arxiv.org/pdf/1409.1556.pdf. [2] YEOM S K, SEEGERER P, LAPUSCHKIN S, et al. Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning. Pattern Recognition, 2021, 115. DOI: 10.1016/j.patcog.2021.107899. [3] LI T H, WU B Y, YANG Y J, et al. Compressing Convolutional Neural Networks via Factorized Convolutional Filters//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2972-3981. [4] HAN S, POOL J, TRAN J, et al. Learning Both Weights and Connections for Efficient Neural Network // Proc of the 28th Internatio-nal Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2015, I: 1135-1143. [5] MOLCHANOV D, ASHUKHA A, VETROV D. Variational Dropout Sparsifies Deep Neural Networks//Proc of the 34th International Conference on Machine Learning. New York, USA: ACM, 2017:2498-2507. [6] ZAGORUYKO S, KOMODAKIS N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer[C/OL]. [2023-2-23].https://openreview.net/pdf?id=Sks9_ajex. [7] DETTMERS T, ZETTLEMOYER L. Sparse Networks from Scratch: Faster Training without Losing Performance[C/OL]. [2023-2-23].https://openreview.net/pdf?id=ByeSYa4KPS. [8] KIM D, SINGH K P, CHOI J. Learning Architectures for Binary Networks//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 575-591. [9] JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up Convolutional Neural Networks with Low Rank Expansions // Proc of the British Machine Vision Conference. Bristol, UK: BMVC, 2014. DOI: 10.5244/C.28.88. [10] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[C/OL].[2023-2-23]. https://arxiv.org/abs/1704.04861. [11] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6848-6856. [12] IANDOLA F N, HAN S, MOSKEWICZ M W, et al SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and< 0.5 MB Model Size[C/OL]. [2023-2-23]. https://arxiv.org/pdf/1602.07360v4.pdf. [13] CHIN T W, ZHANG C, MARCULESCU D. Layer-Compensated Pruning for Resource-Constrained Convolutional Neural Networks[C/OL]. [2023-2-23].https://arxiv.org/pdf/1810.00518.pdf. [14] YE J B, LU X, LIN Z, et al.Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers[C/OL]. [2023-2-23].https://arxiv.org/pdf/1802.00124.pdf. [15] SEHWAG V, WANG S Q, MITTAL P, et al. HYDRA: Pruning Adversarially Robust Neural Networks // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 19655-19666. [16] SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks//Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4510-4520. [17] CAI H, ZHU L G, HAN S. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware[C/OL]. [2023-2-23].https://openreview.net/pdf?id=HylVB3AqYm. [18] MOLCHANOV P, HALL J, YIN H X, et al. LANA: Latency Aware Network Acceleration//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 137-156. [19] LI S, MAO Y X, ZHANG F C, et al. DLW-NAS: Differentiable Light-Weight Neural Architecture Search. Cognitive Computation, 2023, 15: 429-439. [20] YANG Z, SUN Q S. Toward Efficient Neural Architecture Search with Dynamic Mapping-Adaptive Sampling for Resource-Limited Edge Device. Neural Computing and Applications, 2023, 35(7): 5553-5573. [21] LONI M, MOUSAVI H, RIAZATI M, et al. TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices // Proc of the Design, Automation and Test in Europe Conference and Exhibition. Washington, USA: IEEE, 2022: 1115-1118. [22] PENG Y M, SONG A, CIESIELSKI V, et al. PRE-NAS: Evolutionary Neural Architecture Search with Predictor. IEEE Transactions on Evolutionary Computation, 2023, 27(1): 26-36. [23] LI W S, CHEN X H, BAI J Y, et al. Searching for Energy-Efficient Hybrid Adder-Convolution Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 1942-1951. [24] BACH S, BINDER A, MONTAVON G, et al. On Pixel-Wise Explanations for Non-linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS one, 2015, 10(7). DOI: 10.1371/journal.pone.0130140. [25] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586. [26] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning Transfe-rable Architectures for Scalable Image Recognition//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8697-8710. [27] LIU H X, SIMONYAN K, YANG Y M. DARTS: Differentiable Architecture Search[C/OL]. [2023-2-23].https://openreview.net/pdf?id=S1eYHoC5FX. [28] KRIZHEVSKY A, HINTON G. Learning Multiple Layers of Features from Tiny Images[C/OL]. [2023-02-23]. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf. [29] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet Cla-ssification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90. [30] MOUSAVI H, LONI M, ALIBEIGI M, et al. PR-DARTS: Pru-ning-Based Differentiable Architecture Search[C/OL].[2023-2-23]. https://arxiv.org/pdf/2207.06968v2.pdf. [31] TAN M X, LE Q V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks//Proc of the 36th International Conference on Machine Learning. San Diego, USA: JMLR, 2019: 6105-6114. [32] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [33] HEO B, KIM J, YUN S, et al. A Comprehensive Overhaul of Feature Distillation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1921-1930. [34] PARK W, KIM D, LU Y, et al. Relational Knowledge Distillation // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2019: 3962-3971. [35] XU Y X, WANG Y H, CHEN H T, et al. Positive-Unlabeled Compression on the Cloud // Proc of the 33rd International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 2565-2574. [36] YANG J, MARTINEZ B, BULAT A, et al. Knowledge Distillation via Softmax Regression Representation Learning[C/OL].[2023-2-23]. https://openreview.net/pdf?id=ZzwDy_wiWv. [37] CHEN H T, GUO T Y, XU C, et al. Learning Student Networks in the Wild // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021:6424-6433. [38] BIANCO S, CANENE R, CELONA L, et al. Benchmark Analysis of Representative Deep Neural Network Architectures. IEEE Access, 2018, 6: 64270-64277. [39] LIN M B, JI R R, WANG Y, et al. HRank: Filter Pruning Using High-Rank Feature Map // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1526-1535. [40] LIN S H, JI R R, YAN C Q, et al. Towards Optimal Structured CNN Pruning via Generative Adversarial Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2785-2794. [41] LIN M B, JI R R, ZHANG Y X, et al. Channel Pruning via Automatic Structure Search // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 673-679. [42] ZHU J H, ZHAO Y, PEI J H. Progressive Kernel Pruning Based on the Information Mapping Sparse Index for CNN Compression. IEEE Access, 2021, 9: 10974-10987. [43] HE Y, LIU P, WANG Z W, et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4335-4344. [44] HE Y, KANG G L, DONG X Y, et al. Soft Filter Pruning for Acce-lerating Deep Convolutional Neural Networks // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 2234-2240. [45] DONG X Y, YANG Y. Network Pruning via Transformable Architecture Search // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 760-771.