Window Anchored Offset Constrained Dynamic Snake Convolutional Network for Aerial Small Target Detection
ZHANG Rongguo1, QIN Zhen1, HU Jing1, WANG Lifang1, LIU Xiaojun2
1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024; 2. School of Mechanical Engineering, Hefei University of Technology, Hefei 230009
Abstract:To obtain the key and effective information from limited features of small targets and improve the localization ability and detection accuracy of small targets, a window anchored offset constrained dynamic snake convolutional network for aerial small target detection is proposed. Firstly, the offset constrained dynamic snake convolution is constructed. By dynamical offsetting in different directions, the constrained snake convolution kernel adaptively focuses on feature regions of different sizes and shapes, making feature extraction concentrate on tiny local structures and thereby facilitating the capture of small target features. Secondly, by employing two-stage multi-scale feature fusion method, feature alignment fusion and injection are performed on different layer-order feature maps to enhance the fusion of the underlying detail information and the high-level semantic information, and strengthen the transmission of target information of different sizes. Thus, the detection capability of the method for small targets is improved. Meanwhile, the window anchored bounding box regression loss function is designed. The function performs the bounding regression based on the auxiliary bounding box and the minimum point distance to achieve more accurate regression results and enhance the small target localization capability of the model. Finally, comparative experiments on three aerial photography datasets show that the proposed method makes the improvements with different degrees in small target detection performance.
[1] CHEN J Z, JIA K K, CHEN W Q, et al. A Real-Time and High-Precision Method for Small Traffic-Signs Recognition. Neural Computing and Applications, 2022, 34(3): 2233-2245. [2] CHENG Q Q, WANG H J, DING X C, et al. A UAV Target Detection Algorithm Based on YOLOv4-Tiny and Improved WBF // Proc of the 14th International Conference on Wireless Communications and Signal Processing. Washington, USA: IEEE, 2022:122-126. [3] DAI J, ZHAO X, LI L P, et al. GCD-YOLOv5: An Armored Target Recognition Algorithm in Complex Environments Based on Array Lidar. IEEE Photonics Journal, 2022, 14(4). DOI: 10.1109/JPHOT.2022.3185304. [4] LIU Y Q, ZHENG C G, LIU X D, et al. Forest Fire Monitoring Method Based on UAV Visual and Infrared Image Fusion. Remote Sensing, 2023, 15(12). DOI: 10.3390/rs15123173. [5] MEKHALFI M L, NICOLÒ C, BAZI Y, et al. Contrasting YOLOv5 Transformer and EfficientDet Detectors for Crop Circle Detection in Desert. IEEE Geoscience Remote Sensing Letters, 2022, 19. DOI: 10.1109/LGRS.2021.3085139. [6] GIRSHICK R B, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [8] CAI Z W,VASCONCELOS N.Cascade R-CNN: Delving into High Quality Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6154-6162. [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [10] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [11] REDMON J, FARHADI A.YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [12] REDMON J, FARHADI A.YOLOv3: An Incremental Improvement[C/OL]. [2024-04-29].https://arxiv.org/pdf/1804.02767. [13] BOCHKOVSKIY A, WANG C, LIAO H M.YOLOv4: Optimal Speed and Accuracy of Object Detection[C/OL]. [2024-04-29]. https://arxiv.org/pdf/2004.10934. [14] LI C Y, LI L L, JIANG H L, et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications[C/OL].[2024-04-29]. https://arxiv.org/pdf/2209.02976. [15] WANG C, BOCHKOVSKIY A, LIAO H M.YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 7464-7475. [16] DOSOVITSKLY A, BEYER L, KOLENSNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2024-04-29]. https://arxiv.org/pdf/2010.11929. [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. USA: MIT Press, 2017: 6000-6010. [18] DAI J F, QI H Z, XIONG Y W, et al. Deformable Convolutional Networks // Proc of the IEEE International Conference on Compu-ter Vision. Washington, USA: IEEE, 2017: 764-773. [19] DU B W, HUANG Y C, CHEN J X, et al. Adaptive Sparse Con-volutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 13435-13444. [20] WANG W H, DAI J F, CHEN Z, et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 14408-14419. [21] QI Y L, HE Y T, QI X M, et al. Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 6047-6056. [22] 张惊雷,宫文浩,贾鑫.基于自引导注意力的双模态校准融合目标检测算法.模式识别与人工智能, 2023, 36(9): 793-805. (ZHANG J L, GONG W H, JIA X.Object Detection Algorithm with Dual-Modal Rectification Fusion Based on Self-Guided Attention. Pattern Recognition and Artificial Intelligence, 2023, 36(9): 793-805.) [23] YANG C H Y, HUANG Z H, WANG N Y. QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 13658-13667. [24] DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint Triplets for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6568-6577. [25] WANG C C, HE W, NIE Y, et al.Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism // Proc of the 37th International Conference on Neural Information Processing Systems.Cambridge, USA: MIT Press, 2023: 51094-51112. [26] ZHU C C, HE Y H, SAVVIDES M.Feature Selective Anchor-Free Module for Single-Shot Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 840-849. [27] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9756-9765. [28] GEVORGYAN Z.SIoU Loss: More Powerful Learning for Bounding Box Regression[C/OL]. [2024-04-29].https://arxiv.org/pdf/2205.12740. [29] ZHANG H, XU C, ZHANG S J.Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box[C/OL]. [2024-04-29].https://arxiv.org/pdf/2311.02877. [30] MA S L, XU Y.MPDIoU: A Loss for Efficient and Accurate Boun-ding Box Regression[C/OL]. [2024-04-29].https://arxiv.org/pdf/2307.07662. [31] JOCHER G, CHAURASIA A, QIU J.YOLO by Ultralytics[EB/OL]. [2024-04-29].https://github.com/ultralytics/ultralytics. [32] JOCHER G.YOLOv5 by Ultralytics[EB/OL]. [2024-04-29].https://github.com/ultralytics/yolov5.