摘要 针对无人机航拍图像目标检测中视野变化大、时空信息复杂等问题,文中基于YOLOv5(You Only Look Once Version5)架构,提出基于图像低维特征融合的航拍小目标检测模型.引入CA(Coordinate Attention),改进MobileNetV3的反转残差块,增加图像空间维度信息的同时降低模型参数量.改进YOLOv5特征金字塔网络结构,融合浅层网络中的特征图,增加模型对图像低维有效信息的表达能力,进而提升小目标检测精度.同时为了降低航拍图像中复杂背景带来的干扰,引入无参平均注意力模块,同时关注图像的空间注意力与通道注意力;引入VariFocal Loss,降低负样本在训练过程中的权重占比.在VisDrone数据集上的实验验证文中模型的有效性,该模型在有效提升检测精度的同时明显降低复杂度.
Abstract:To address the challenges of significant changes in the field of view and complex spatiotemporal information in unmanned aerial vehicle aerial image target detection, a model for small object detection in aerial photography based on low dimensional image feature fusion is presented grounded on the YOLOv5(you only look once version 5) architecture. Coordinate attention is introduced to improve the inverted residuals of MobileNetV3, thereby increasing the spatial dimension information of images while reducing parameters of the model. The YOLOv5 feature pyramid network structure is improved to incorporate feature images from shallow networks. The ability of the model to represent low-dimensional effective information of images is enhanced, and consequently the detection accuracy of the proposed model for small objects is improved. To reduce the impact of complex background in the image, the parameter-free average attention module is introduced to focus on both spatial attention and channel attention. VariFocal Loss is adopted to reduce the weight proportion of negative samples in the training process. Experiments on VisDrone dataset demonstrate the effectiveness of the proposed model. The detection accuracy is effectively improved while the model complexity is significantly reduced.
蔡逢煌, 张家翔, 黄捷. 基于图像低维特征融合的航拍小目标检测模型[J]. 模式识别与人工智能, 2024, 37(2): 162-171.
CAI Fenghuang, ZHANG Jiaxiang, HUANG Jie. Model for Small Object Detection in Aerial Photography Based on Low Dimensional Image Feature Fusion. Pattern Recognition and Artificial Intelligence, 2024, 37(2): 162-171.
[1] SHAKHATREH H, SAWALMEH A H, Al-FUQAHA A, et al. Unmanned Aerial Vehicles(UAVs): A Survey on Civil Applications and Key Research Challenges. IEEE Access, 2019, 7: 48572-48634. [2] JAIN A, RAMAPRASAD R, NARANG P, et al. AI-Enabled Object Detection in UAVs: Challenges, Design Choices, and Research Directions. IEEE Network, 2021, 35(4): 129-135. [3] 朱槐雨,李博.单阶段多框检测器无人机航拍目标识别方法.计算机应用, 2021, 41(11): 3234-3241. (ZHU H Y, LI B.Single Shot Multibox Detector Recognition Method for Aerial Targets of Unmanned Aerial Vehicle. Journal of Computer Applications, 2021, 41(11): 3234-3241.) [4] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [5] REDMON J, FARHADI A. YOLO9000: Better, Faster, Stronger//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-box Detector//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [8] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN//Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2020: 2980-2988. [9] REDMON J, FARHADI A. YOLOv3: An Incremental Improvement[C/OL].[2023-10-14].https://arxiv.org/pdf/1804.02767v1.pdf. [10] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Co-mmon Objects in Context//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755. [11] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database//Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [12] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results//Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2019: 213-226. [13] LIU W, LIAO S C, HU W D, et al. Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 634-659. [14] NAYAN A A, SAHA J, MOZUMDER A N,et al. Real Time Multi-class Object Detection and Recognition Using Vision Augmentation Algorithm. International Journal of Advanced Science and Techno-logy, 2020, 29(5): 14070-14083. [15] WANG Z H, ZHANG X Y, LI J,et al. A YOLO-Based Target Detection Model for Offshore Unmanned Aerial Vehicle Data. Sustai-nability, 2021, 13(23). DOI: 10.3390/su132312980. [16] 赵佳琦,张迪,周勇,等.基于深度强化学习的遥感图像可解释目标检测方法.模式识别与人工智能, 2021, 34(9): 777-786. (ZHAO J Q, ZHANG D, ZHOU Y, et al. Interpretable Object Detection Method for Remote Sensing Images Based on Deep Reinforcement Learning. Pattern Recognition and Artificial Intelligence, 2021, 34(9): 777-786.) [17] SHAO Z F, CHENG G, MA J Y, et al. Real-Time and Accurate UAV Pedestrian Detection for Social Distancing Monitoring in COVID-19 Pandemic. IEEE Transactions on Multimedia, 2021, 24: 2069-2083. [18] HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetv3//Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1314-1324. [19] HOU Q B, ZHOU D Q, FENG J S. Coordinate Attention for Efficient Mobile Network Design//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13708-13717. [20] KAMATE S, YILMAZER N.Application of Object Detection and Tracking Techniques for Unmanned Aerial Vehicles. Procedia Computer Science, 2015, 61: 436-441. [21] 刘丽,郑洋,付冬梅. 改进YOLOv3 网络结构的遮挡行人检测算法.模式识别与人工智能, 2020, 33(6): 568-574. (LIU L, ZHENG Y, FU D M.Occluded Pedestrian Detection Algorithm Based on Improved YOLOv3. Pattern Recognition and Artificial Intelligence, 2020, 33(6): 568-574.) [22] ZHU X K, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios//Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 2778-2788. [23] KOERBER N.Parameter-Free Average Attention Improves Convolutional Neural Network Performance(Almost) Free of Charge[C/OL].[2023-10-14].https://arxiv.org/pdf/2210.07828.pdf. [24] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection//Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2999-3007. [25] ZHANG H Y, WANG Y, DAYOUB F, et al. VariFocalNet: An IoU-Aware Dense Object Detector//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 8510-8519. [26] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 122-138. [27] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586. [28] LI H L, LI J, WEI H B, et al. Slim-Neck by GSConv: A Better Design Paradigm of Detector Architectures for Autonomous Vehicles[J/OL].[2023-10-14]. https://arxiv.org/ftp/arxiv/papers/2206/2206.02424.pdf. [29] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trai-nable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 7464-7475. [30] 陈旭,彭冬亮,谷雨.基于改进YOLOv5s的无人机图像实时目标检测.光电工程, 2022, 49(3): 69-81. (CHEN X, PENG D L, GU Y.Real-Time Object Detection for UAV Images Based on Improved YOLOv5s. Opto-Electronic Engineering, 2022, 49(3): 69-81.) [31] 冒国韬,邓天民,于楠晶.基于多尺度分割注意力的无人机航拍图像目标检测算法.航空学报, 2023, 44(5): 273-283. (MAO G T, DENG T M, YU N J.Object Detection in UAV Images Based on Multi-scale Split Attention. Acta Aeronautica et Astronautica Sinica, 2023, 44(5): 273-283.)