|
|
|
| UAV Target Detection Method Based on Multimodal Image Feature Fusion |
| XUE Wenhui1, CHEN Zhongcheng1, CHEN Jun2, WANG Yong1 |
1. School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan 430074; 2. School of Automation, China University of Geosciences, Wuhan 430074 |
|
|
|
|
Abstract Multimodal images exhibit significant complementarity at the perception level. Infrared images provide stable target responses under low-light conditions and complex backgrounds, while visible images offer rich texture and detailed information. The fusion of the above images effectively enhances the robustness and accuracy of unmanned aerial vehicle(UAV) object detection in complex environments. Therefore, a UAV target detection method based on multimodal image feature fusion(MIFF-UAVDet) is proposed. YOLOv7-tiny is employed as the backbone and dual branches are constructed for infrared and visible modalities. Features are extracted separately from each modality to provide complementary representations for subsequent feature fusion. Furthermore, a lightweight multi-scale spatial attention fusion module is introduced to guide adaptive spatial-level cross-modal fusion and strengthen feature representation by integrating channel compression module, multi-scale depthwise separable convolutions and multi-scale spatial attention mechanism. Meanwhile, due to the scale compression and shape distortion of targets under UAV aerial perspectives, the aspect ratio penalty term in complete intersection over union(CIoU) is prone to ineffectiveness during practical regression, and the localization accuracy is reduced. To address these issues, an improved height-width constrained loss function based on CIoU(HWCIoU) is proposed. Experimental results show that MIFF-UAVDet outperforms the state-of-the-art methods in terms of detection accuracy, localization precision and inference speed, and MIFF-UAVDet exhibits stronger robustness in scenarios with complex backgrounds, varying illumination, and significant target scale variations.
|
|
Received: 25 November 2025
|
|
|
| Fund:National Natural Science Foundation of China(No.62073304) |
|
Corresponding Authors:
WANG Yong, Ph.D., professor. His research interests include internet of things/wireless sensor networks, deep learning, brain-computer interface, and embedded systems.
|
About author:: XUE Wenhui, Master student. His research interests include computer vision, image processing and object detection. CHEN Zhongcheng, Master student. His research interests include deep learning, embedded systems and internet of things. CHEN Jun, Ph.D., professor. Her research interests include artificial intelligence, pattern recognition, and computer vision technologies. |
|
|
|
[1] CHEN H, YAN H Q, YANG X, et al. Efficient Adversarial Attack Strategy Against 3D Object Detection in Autonomous Driving Systems. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 16118-16132. [2] ABOUOUF M, SINGH S, MIZOUNI R, et al. Explainable AI for Event and Anomaly Detection and Classification in Healthcare Monitoring Systems. IEEE Internet of Things Journal, 2024, 11(2): 3446-3457. [3] ZHAO Q, WANG Y, WANG B Y, et al. MSC-AD: A Multiscene Unsupervised Anomaly Detection Dataset for Small Defect Detection of Casting Surface. IEEE Transactions on Industrial Informatics, 2024, 20(4): 6041-6052. [4] JIAO T Z, GUO C P, FENG X Y, et al. A Comprehensive Survey on Deep Learning Multi-modal Fusion: Methods, Technologies and Applications. Computers, Materials and Continua, 2024, 80(1). DOI: 10.32604/cmc.2024.053204. [5] WANG Z A, LIAO X H, YUAN J, et al. CDC-YOLOFusion: Leveraging Cross-Scale Dynamic Convolution Fusion for Visible-Infrared Object Detection. IEEE Transactions on Intelligent Vehicles, 2025, 10(3): 2080-2093. [6] GIRSHICK R.Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [8] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-box Detector // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [9] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [10] REDMON J, FARHADI A.YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2017: 6517-6525. [11] REDMON J, FARHADI A.YOLOv3: An Incremental Improvement[C/OL]. [2025-10-09].https://arxiv.org/pdf/1804.02767. [12] BOCHKOVSKIY A, WANG C, LIAO H Y.YOLOv4: Optimal Speed and Accuracy of Object Detection[C/OL]. [2025-10-09].https://arxiv.org/pdf/2004.10934. [13] WANG C, BOCHKOVSKIY A, LIAO H Y.YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 7464-7475. [14] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc of the AAAI Conference on Artificial Intelligence, 2020, 34(07): 12993-13000. [15] 张宏钢,杨海涛,郑逢杰,等.特征级红外与可见光图像融合方法综述.计算机工程与应用, 2024, 60(18): 17-31. (ZHANG H G, YANG H T, ZHENG F J, et al. Review of Feature-Level Infrared and Visible Image Fusion. Computer Enginee-ring and Applications, 2024, 60(18): 17-31.) [16] WAGNER J, FISCHER V, HERMAN M, et al. Multispectral Pedestrian Detection Using Deep Fusion Convolutional Neural Networks // Proc of the 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium: ESANN, 2016: 509-514. [17] FRENCH G, FINLAYSON G, MACKIEWICZ M.Multi-spectral Pe-destrian Detection via Image Fusion and Deep Neural Networks. Journal of Imaging Science and Technology, 2018, 62: 176-181. [18] VANDERSTEEGEN M, VAN BEECK K, GOEDEMÉ T.Real-Time Multispectral Pedestrian Detection with a Single-Pass Deep Neural Network // Proc of the 15th International Conference on Image Analysis and Recognition. Berlin, Germany: Springer, 2018: 419-426. [19] ZHUANG Y F, PU Z Y, HU J, et al. Illumination and Temperature-Aware Multispectral Networks for Edge-Computing-Enabled Pedestrian Detection. IEEE Transactions on Network Science and Engineering, 2022, 9(3): 1282-1295. [20] LI Q, ZHANG C Q, HU Q H, et al. Confidence-Aware Fusion Using Dempster-Shafer Theory for Multispectral Pedestrian Detection. IEEE Transactions on Multimedia, 2022, 25: 3420-3431. [21] HU Z H, JING Y G, WU G Q.Decision-Level Fusion Detection Method of Visible and Infrared Images under Low Light Conditions. EURASIP Journal on Advances in Signal Processing, 2023. DOI: 10.1186/s13634-023-01002-5. [22] YANG X X, QIAN Y Q, ZHU H J, et al. BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection // Proc of the International Conference on Robotics and Automation. Washington, USA: IEEE, 2022: 2920-2926. [23] KIM J U, PARK S, RO Y M.Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1510-1523. [24] ZHANG Y, YU H, HE Y J, et al. Illumination-Guided RGBT Object Detection with Inter- and Intra-Modality Fusion. IEEE Transactions on Instrumentation and Measurement, 2023, 72. DOI: 10.1109/TIM.2023.3251414. [25] SHEN J F, CHEN Y F, LIU Y, et al. ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection. Pattern Recognition, 2024, 145. DOI: 10.1016/j.patcog.2023.109913. [26] XIE Y M, ZHANG L W, YU X Y, et al. YOLO-MS: Multispectral Object Detection via Feature Interaction and Self-Attention Guided Fusion. IEEE Transactions on Cognitive and Developmental Systems, 2023, 15(4): 2132-2143. [27] CHOLLET F.Xception: Deep Learning with Depthwise Separable Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 1800-1807. [28] WOO S, PARK J, LEE J, et al. CBAM: Convolutional Block Atten-tion Module // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19. [29] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regre-ssion // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 658-666. [30] SONG K C, XUE X T, WEN H W, et al. Misaligned Visible-Thermal Object Detection: A Drone-Based Benchmark and Baseline. IEEE Transactions on Intelligent Vehicles, 2024, 9(11): 7449-7460. [31] SUN Y M, CAO B, ZHU P F, et al. Drone-Based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713. [32] FANG Q Y, HAN D P, WANG Z K.Cross-Modality Fusion Transformer for Multispectral Object Detection[C/OL]. [2025-10-09].https://arxiv.org/pdf/2111.00273. [33] ZHANG J M, LIU H Y, YANG K L, et al. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 14679-14694. [34] ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and Efficient IoU Loss for Accurate Bounding Box Regression. Neurocomputing, 2022, 506: 146-157. [35] GEVORGYAN Z.SIoU Loss: More Powerful Learning for Bounding Box Regression[C/OL]. [2025-10-09].https://arxiv.org/pdf/2205.12740. [36] ZHANG H, ZHANG S J.Shape-IoU: More Accurate Metric Considering Bounding Box Shape and Scale[C/OL]. [2025-10-09].https://arxiv.org/pdf/2312.17663. [37] TONG Z J, CHEN Y H, XU Z W, et al. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism[C/OL].[2025-10-09]. https://arxiv.org/pdf/2301.10051. [38] MA S L, XU Y.MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression[C/OL]. [2025-10-09].https://arxiv.org/pdf/2307.07662. |
|
|
|