基于混合知识分解的增强残差网络

doi:10.16451/j.cnki.issn1003-6059.202404004

Abstract
Figure/Table
References (51)
Related Citation (15)

Download: PDF (1152 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract

Methodssuch as stimulative training and group knowledge based training are employed to collect group knowledge from shallow subnets in residual networks for self-distillation, thereby enhancing network performance. However, the group knowledge acquired by these methods suffers from issues such as slow updating and difficulties in combining with DataMix techniques. To address these issues, enhanced residual networks via mixed knowledge fraction(MKF) are proposed. The mixed knowledge is decomposed and modeled as quadratic programming by minimizing the fraction loss, and thus high-quality group knowledge is obtained from the mixed knowledge. To improve the robustness and diversity of the knowledge, a compound DataMix technique is proposed to construct a composite data augmentation method. Different from high-precision optimization algorithms with poor efficiency, a simple and efficient linear knowledge fraction technique is designed. The previous group knowledge is taken as knowledge bases, and the mixed knowledge is decomposed based on the knowledge bases. The enhanced group knowledge is then adopted to distill sampled subnetworks. Experiments on mainstream residual networks and classification datasets verify the effectiveness of MKF.

Key words： Deep Learning Neural Network Knowledge Distillation Network Enhancement Residual Network

Received: 25 March 2024

ZTFLH:

TP 37

Fund:

National Key Research and Development Program of China(No.2022ZD0160100), National Natural Science Foun-dation of China(No.62071127,62101137), Shanghai Natural Science Foundation(No.23ZR1402900), Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0103)

Corresponding Authors: CHEN Tao, Ph.D., professor. His research interests include computer vision and machine learning.

About author:: TANG Shengji, Master student. His research interests include deep learning, model efficiency, and model design and enhancement. YE Peng, Ph.D. candidate. His research interests include computer science, model design and optimization, and artificial intelligence for science. LIN Weihao, Ph.D. candidate. His research interests include computer vision, image recognition, video processing and model compression.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	TANG Shengji
	YE Peng
	LIN Weihao
	CHEN Tao

Cite this article:

TANG Shengji,YE Peng,LIN Weihao等. Enhanced Residual Networks via Mixed Knowledge Fraction[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(4): 328-338.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202404004 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I4/328

[1] WIGHTMAN R, TOUVRON H, JÉGOU H. ResNet Strikes Back: An Improved Training Procedure in Timm[C/OL].[2024-02-16]. https://arxiv.org/pdf/2110.00476.
[2] DING X H, ZHANG X Y, MA N N, et al. RepVGG: Making VGG-Style ConvNets Great Again // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13728-13737.
[3] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[4] SRIVASTAVA S, SHARMA G. OmniVec: Learning Robust Representations with Cross Modal Sharing // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2024: 1225-1237.
[5] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional Networks for Biomedical Image Segmentation // Proc of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2015: 234-241.
[6] YE P, LI B P, CHEN T, et al. Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation. International Journal of Computer Vision, 2022, 130(11): 2674-2694.
[7] WANG W H, DAI J F, CHEN Z, et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2023: 14408-14419.
[8] ZHAO H S, SHI J P, QI X J, et al. Pyramid Scene Parsing Network // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6230-6239.
[9] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448.
[10] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[11] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788.
[12] WANG C C, HE W, NIE Y, et al. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism[C/OL].[2024-02-16]. https://arxiv.org/pdf/2309.11331.
[13] HE K M, ZHANG X Y, REN S Q, et al. Identity Mappings in Deep Residual Networks // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 630-645.
[14] BALDUZZI D, FREAN M, LEARY L, et al. The Shattered Gradients Problem: If ResNets Are the Answer, Then What Is the Question? // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 342-350.
[15] VEIT A, WILBER M, BELNGIE S. Residual Networks Behave Like Ensembles of Relatively Shallow Networks // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 550-558.
[16] SUN T F, DING S F, GUO L L. Low-Degree Term First in ResNet, Its Variants and the Whole Neural Network Family. Neural Networks, 2022, 148: 155-165.
[17] CHANG S N, WANG P C, LUO H, et al. Revisiting Vision Trans-former from the View of Path Ensemble // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 19832-19842.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
[19] HAN D C, PAN X R, HAN Y Z, et al. Flatten Transformer: Vision Transformer Using Focused Linear Attention // Proc of the IEEE/CVF International Conference on Computer Vision. Wa-shington, USA: IEEE, 2023: 5938-5948.
[20] LI F, ZHANG H, XU H Z, et al. Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 3041-3050.
[21] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[22] YANG T, ZHU S J, CHEN C. GradAug: A New Regularization Method for Deep Neural Networks // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 14207-14218.
[23] HUANG G, SUN Y, LIU Z, et al. Deep Networks with Stochastic Depth // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 646-661.
[24] PENG Y, TANG S J, LI B P, et al. Stimulative Training of Resi-dual Networks: A Social Psychology Perspective of Loafing // Proc of the 36th International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2022: 3596-3608.
[25] TANG S J, YE P, LI B P, et al. Boosting Residual Networks with Group Knowledge. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(6): 5162-5170.
[26] CHO J H, HARIHARAN B. On the Efficacy of Knowledge Disti-llation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 4793-4801.
[27] MIRZADEH S I, FARAJTABAR M, LI N, et al. Improved Know-ledge Distillation via Teacher Assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 5191-5198.
[28] LI X C, FAN W S, SONG S M, et al. Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press,2022: 3830-3842.
[29] LAN X, ZHU X T, GONG S G. Knowledge Distillation by On-the-Fly Native Ensemble // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 7528-7538.
[30] WU G L, GONG S G. Peer Collaborative Learning for Online Knowledge Distillation. Proceedings of the AAAI Conference on Arti-ficial Intelligence, 2021, 35(12): 10302-10310.
[31] CHEN D F, MEI J P, WANG C, et al. Online Knowledge Distillation with Diverse Peers. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 3430-3437.
[32] YANG C G, AN Z L, ZHOU H L, et al. Online Knowledge Distillation via Mutual Contrastive Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10212-10227.
[33] ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond Empirical Risk Minimization[C/OL].[2024-02-16]. https://arxiv.org/pdf/1710.09412.
[34] YUN S D, HAN D, CHUN S, et al. CutMix: Regularization Stra-tegy to Train Strong Classifiers with Localizable Features // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6022-6031.
[35] WALAWALKAR D, SHEN Z Q, LIU Z C, et al. Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2020. DOI: 10.1109/ICASSP40776.2020.9053994.
[36] KIM G, HAN D K, KO H. SpecMix: A Mixed Sample Data Augmentation Method for Training with Time-Frequency Domain Features[C/OL]. [2024-02-16]. https://arxiv.org/abs/2108.03020.
[37] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735.
[38] ARNSTRÖM D, BEMPORAD A, AXEHILL D. A Dual Active-Set Solver for Embedded Quadratic Programming Using Recursive LDLT Updates. IEEE Transactions on Automatic Control, 2022, 67(8): 4362-4369.
[39] DOMAHIDI A, CHU E, BOYD S. ECOS: An SOCP Solver for Embedded Systems // Proc of the European Control Conference. Washington, USA: IEEE, 2013: 2071-2076.
[40] PANDALA A G, DING Y R, PARK H W. qpSWIFT: A Real-Time Sparse Quadratic Program Solver for Robotic Applications. IEEE Robotics and Automation Letters, 2019, 4(4): 3355-3362.
[41] AMOS B, KOLTER J Z. OptNet: Differentiable Optimization as a Layer in Neural Networks // Proc of the 34th International Confe-rence on Machine Learning. San Diego, USA: JMLR, 2017, 70: 136-145.
[42] KRIZHEVSKY A, HINTON G. Learning Multiple Layers of Features from Tiny Images[C/OL]. [2024-02-16]. http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
[43] DENG J, DANG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255.
[44] ZAGORUYKO S, KOMODAKIS N. Wide Residual Networks[C/OL]. [2024-02-16]. https://bmva-archive.org.uk/bmvc/2016/papers/paper087/paper087.pdf.
[45] HOWARD A, SANDLER M, CHEN B, et al. Searching for Mobile-netv3 // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1314-1324.
[46] YUN S, PARK J, LEE K, et al. Regularizing Class-Wise Predictions via Self-Knowledge Distillation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13873-13882.
[47] KIM K, JI B, YOON D, et al. Self-Knowledge Distillation with Progressive Refinement of Targets // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 6547-6556.
[48] DENG X, ZHANG Z F. Learning with Retrospection. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(8): 7201-7209.
[49] SHEN Y Q, XU L W, YANG Y Z, et al. Self-Distillation from the Last Mini-Batch for Consistency Regularization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 11933-11942.
[50] ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning Deep Features for Discriminative Localization // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2921-2929.
[51] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626.