Abstract:To address the challenges of high model complexity and excessive parameter counts in existing image super-resolution(SR) reconstruction methods, a lightweight image SR reconstruction method based on multi-scale spatial adaptive attention network(MSAAN) is proposed. First, a global feature modulation module(GFM) is designed to learn global texture features. Additionally, a lightweight multi-scale feature aggregation module(MFA) is introduced to adaptively aggregate high-frequency spatial features from local to global scales. Second, the multi-scale spatial adaptive attention module(MSAA) is proposed by integrating GFM and MFA. Finally, a feature interactive gated feed-forward module(FIGFF) is incorporated to enhance the local feature extraction capability while reducing the channel redundancy. Extensive experiments demonstrate that MSAAN effectively captures more comprehensive and refined features, significantly improving reconstruction quality while maintaining a lightweight structure.
[1] DONG C, LOY C C, HE K M, et al. Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2), 295-307. [2] KIM J, LEE J K, LEE K M.Accurate Image Super-Resolution Using Very Deep Convolutional Networks//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1646-1654. [3] LIM B, SON S, KIM H, et al. Enhanced Deep Residual Networks for Single Image Super-Resolution//Proc of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2017: 1132-1140. [4] AHN N, KANG B, SOHN K A.Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network//Proc of the European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 256-272. [5] HUI Z, WANG X M, GAO X B.Fast and Accurate Single Image Super-Resolution via Information Distillation Network//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 723-731. [6] HUI Z, GAO X B, YANG Y C, et al. Lightweight Image Super-Resolution with Information Multi-distillation Network//Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 2024-2032. [7] LIU J, TANG J, WU G S.Residual Feature Distillation Network for Lightweight Image Super-Resolution//Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2021: 41-55. [8] LI W B, ZHOU K, QI L, et al. LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-Resolution and Beyond//Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 20343-20355. [9] KONG F Y, LI M X, LIU S W, et al. Residual Local Feature Network for Efficient Super-Resolution//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2022: 765-775. [10] SUN L, PAN J S, TANG J H. ShuffleMixer: An Efficient ConvNet for Image Super-Resolution//Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 17314-17326. [11] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2024-10-25]. https://arxiv.org/pdf/2010.11929. [12] CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained Image Processing Transformer//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 12294-12305. [13] LIANG J Y, CAO J Z, SUN G L, et al. SwinIR: Image Restoration Using Swin Transformer//Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2021: 1833-1844. [14] LU Z S, LIU H, LI J C, et al. Efficient Transformer for Single Image Super-Resolution[C/OL].[2024-10-25]. https://arxiv.org/pdf/2108.11084v1. [15] GAO G W, WANG Z X, LI J C, et al. Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer//Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 913-919. [16] SUN L, DONG J X, TANG J H, et al. Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution[C/OL].[2024-10-25]. https://arxiv.org/pdf/2302.13800. [17] GU J J, DONG C.Interpreting Super-Resolution Networks with Local Attribution Maps//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 9195-9204. [18] LU T, WANG J M, JIANG J J, et al. Global-Local Fusion Net-work for Face Super-Resolution. Neurocomputing, 2020, 387: 309-320. [19] WANG L, SHEN J, TANG E, et al. Multi-scale Attention Network for Image Super-Resolution. Journal of Visual Communication and Image Representation, 2021, 80. DOI: 10.1016/j.jvcir.2021.103300. [20] SHI W Z, CABALLERO J, HUSZÁR F, et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1874-1883. [21] BA J L, KIROS J R, HINTON G E. Layer Normalization[C/OL].[2024-10-25]. https://arxiv.org/pdf/1607.06450. [22] WANG Y, LI Y S, WANG G, et al. Multi-scale Attention Network for Single Image Super-Resolution[C/OL].[2024-10-25]. https://arxiv.org/pdf/2209.14145. [23] FAN Q H, HUANG H B, CHEN M R, et al. RMT: Retentive Networks Meet Vision Transformers//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 5641-5651. [24] YU W H, LUO M, ZHOU P, et al. MetaFormer Is Actually What You Need for Vision//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 10809-10819. [25] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows//Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002. [26] ZHAO H Y, KONG X T, HE J W, et al. Efficient Image Super-Resolution Using Pixel Attention//Proc of the European Confe-rence on Computer Vision. Berlin,Germany: Springer, 2020: 56-72. [27] HENDRYCKS D, GIMPEL K. Gaussian Error Linear Units(GELUs)[C/OL].[2024-10-25]. https://arxiv.org/pdf/1606.08415. [28] LI S Y, WANG Z D, LIU Z C, et al. Efficient Multi-order Gated Aggregation Network[C/OL].[2024-10-25]. https://arxiv.org/pdf/2211.03295. [29] ZHAO H S, SHI J P, QI X J, et al. Pyramid Scene Parsing Network//Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2017: 2881-2890. [30] WU B C, WAN A, YUE X Y, et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 9127-9135. [31] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need//Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [32] TIMOFTE R, AGUSTSSON E, VAN GOOL L, et al. NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results//Proc of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2017: 1110-1121. [33] BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-Com-plexity Single-Image Super-Resolution Based on Nonnegative Neigh-bor Embedding[C/OL].[2024-10-25]. https://bmva-archive.org.uk/bmvc/2012/BMVC/paper135/abstract135.pdf. [34] ZEYDE R, ELAD M, PROTTER M.On Single Image Scale-Up Using Sparse-Representations//Proc of the 7th International Con-ference on Curves and Surfaces. Berlin, Germany: Springer, 2012: 711-730. [35] ARBELÁEZ P, MAIRE M, FOWLKES C, et al. Contour Detection and Hierarchical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 898-916. [36] HUANG J B, SINGH A, AHUJA N.Single Image Super-Resolution from Transformed Self-Exemplars//Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 5197-5206. [37] MATSUI Y, ITO K, ARAMAKI Y, et al. Sketch-Based Manga Retrieval Using Manga109 Dataset. Multimedia Tools and Applications, 2016, 76: 21811-21838. [38] KINGMA D P, BA J L.Adam: A Method for Stochastic Optimization[C/OL].[2024-10-25]. https://arxiv.org/pdf/1412.6980. [39] LOSHCHILOV I, HUTTER F.SGDR: Stochastic Gradient Descent with Warm Restarts[C/OL].[2024-10-25].https://arxiv.org/pdf/1608.03983. [40] ZHANG Y L, LI K P, LI K, et al. Image Super-Resolution Using Very Deep Residual Channel Attention Networks//Proc of the European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 294-310. [41] LAI W S, HUANG J B, AHUJA N, et al. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11): 2599-2613. [42] CHEN Z, ZHANG Y L, GU J J, et al. Dual Aggregation Transformer for Image Super-Resolution//Proc of the IEEE/CVF International Conference on Computer Vision. Washington,USA: IEEE, 2023: 12278-12287. [43] LUO X T, XIE Y, ZHANG Y L, et al. LatticeNet: Towards Lightweight Image Super-Resolution with Lattice Block//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 272-289. [44] SUN B, ZHANG Y L, JIANG S Y, et al. Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(2): 2375-2383. [45] BEHJATI P, RODRIGUEZ P, FERNÁNDEZ C, et al. Single Image Super-Resolution Based on Directional Variance Attention Network. Pattern Recognition, 2023, 133. DOI: 10.1016/j.patcog.2022.108997. [46] DU Z C, LIU D, LIU J, et al. Fast and Memory-Efficient Network towards Efficient Image Super-Resolution//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2022: 852-861. [47] CHOI H, LEE J, YANG J.N-gram in Swin Transformers for Efficient Lightweight Image Super-Resolution//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 2071-2081. [48] ZHANG M J, WU Q Q, ZHANG J, et al. Fluid Micelle Network for Image Super-Resolution Reconstruction. IEEE Transactions on Cybernetics, 2023, 53(1): 578-591. [49] ZHANG M J, WU Q Q, GUO J, et al. Heat Transfer-Inspired Network for Image Super-Resolution Reconstruction. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(2): 1810-1820. [50] ZHANG M J, XIN J W, ZHANG J, et al. Curvature Consistent Network for Microscope Chip Image Super-Resolution. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(12): 10538-10551.