To address the issues of highly dense small targets, severe occlusion in UAV images, and high computational complexity of existing general detection models, a lightweight heterogeneous collaborative dual-path bridged network(LHCB-Net) for small target detection in UAV images is proposed in this paper. Based on the YOLOv9 framework, the functionally homogeneous stacking and the limited receptive field of RepNCSPELAN4 are decoupled. A backbone network is constructed by combining the reparameterized convolution block RepVCBlock and the global attention large selective kernel network(Galsk). Thus, a three-dimensional attention mechanism covering channel, space and global contextual dependencies is integrated to effectively enhance the perception of small targets. Galsk combines global modeling and feedforward enhancement mechanisms to improve feature extraction in complex backgrounds and occlusion scenarios and compensate for the receptive field reduction caused by lightweight design. Moreover, secondary backbone features are directly connected to the detection head through cross-layer bridging to achieve multi-scale feature fusion and optimize localization accuracy. Additionally, a scale-adaptive IoU loss function is introduced to dynamically adjust regression weights for targets of different scales. Experimental results on VisDrone2019, UAVDT, and a self-built dataset demonstrate that LHCB-Net significantly improves detection performance for dense small targets while reducing parameters and computational cost, providing an efficient solution for real-time onboard detection. The complete code is available at: https://github.com/tson122556/LHCB-Net/tree/master.
[1] GIRSHICK R.Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448.
[2] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[3] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
[4] REDMON J, FARHADI A. YOLOv3: An Incremental Improvement[C/OL]. [2025-08-19]. http://arxiv.org/abs/1804.02767.
[5] BOCHKOVSKIY A, WANG C, LIAO H M. YOLOv4: Optimal Speed and Accuracy of Object Detection[C/OL]. [2025-08-19]. http://arxiv.org/abs/2004.10934.
[6] WANG C, BOCHKOVSKIY A, LIAO H M.YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2023: 7464-7475.
[7] VARGHESE R, SAMBATH M.YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness // Proc of the International Conference on Advances in Data Engineering and Inte-lligent Computing Systems. Washington, USA: IEEE, 2024. DOI: 10.1109/ADICS58448.2024.10533619.
[8] DOSOVISKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2025-08-19]. https://arxiv.org/pdf/2010.11929.
[9] TAN M X, PANG R M, LE Q V.EfficientDet: Scalable and Efficient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10778-10787.
[10] WANG J D, SUN K, CHENG T H, et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349-3364.
[11] LIN T, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2999-3007.
[12] LIU T, LUO P Q, ZHANG Y X. Lightweight Feature Fusion for Single Shot Multibox Floater Detection // Proc of the 12th International Conference on Communications, Signal Processing, and Systems. Berlin, Germany: Springer, 2024, I: 235-243.
[13] CUO C, LÜ X L, ZHANG Y, et al. Improved YOLOv4-Tiny Network for Real-Time Electronic Component Detection. Scientific Reports, 2021, 11(1). DOI: 10.1038/s41598-021-02225-y.
[14] WANG C, YEH I, LIAO H M.YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information // Proc of the 18th European Conference Computer Vision. Berlin, Germany: Springer, 2025. DOI: 10.1007/978-3-031-72751-1_1.
[15] WANG A, CHEN H, LIU L H, et al. YOLOv10: Real-Time End-to-End Object Detection[C/OL].[2025-08-19]. https://arxiv.org/pdf/2405.14458.
[16] XUE Y, YAO C H, WAHIB M, et al. YOLO-DKR: Differentiable Architecture Search Based on Kernel Reusing for Object Detection. Information Sciences, 2025, 713. DOI: 10.1016/j.ins.2025.122180.
[17] WANG A, CHEN H, LIN Z J, et al. Rep ViT: Revisiting Mobile CNN from ViT Perspective // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 15909-15920.
[18] LI Y X, HOU Q B, ZHENG Z H, et al. Large Selective Kernel Network for Remote Sensing Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 16748-16759.
[19] XIONG Y W, LI Z Q, CHEN Y T, et al. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 5652-5661.
[20] ZHENG Z H, WANG P, LIU W, et al. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000.
[21] ZHENG Z H, WANG P, REN D W, et al. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586.
[22] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results // Proc of the IEEE/CVF International Conference on Computer Vision Workshop. Washington, USA: IEEE, 2019: 213-226.
[23] YU H Y, LI G R, ZHANG W G, et al. The Unmanned Aerial Vehi-cle Benchmark: Object Detection, Tracking and Baseline. International Journal of Computer Vision, 2020, 128(5): 1141-1159.
[24] GLENN J, ALEX S, JIRKA B, et al. Ultralytics/YOLOv5:v3.0[EB/OL]. [2025-08-19]. https://ui.adsabs.harvard.edu/abs/2020zndo..3983579J/abstract.
[25] GLENN J, QIU J, CHAURASIA A.Ultralytics YOLO11[EB/OL]. [2025-08-29]. https://scholar.google.com/citations?view_op=view_citation&hl=zh-CN&user=swSrGtsAAAAJ&citation_for_view=swSrGtsAAAAJ:GnPB-g6toBAC.
[26] ZHANG Y, YE M, ZHU G Y, et al. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62. DOI: 10.1109/TGRS.2024.3363057.
[27] ZHU X K, LÜ S C, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2021: 2778-2788.
[28] ZHAO Y, LÜ W Y, XU S L, et al. DETRs Beat YOLOs on Real-time Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 16965-16974.
[29] ZHANG H X, ZHANG H, LIU K, et al. UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Ima-gery // Proc of the IEEE/RSJ International Conference on Intelli-gent Robots and Systems. Washington, USA: IEEE, 2025: 15143-15149.
[30] YANG F, FAN H, CHU P, et al. Clustered Object Detection in Aerial Images // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 8310-8319.
[31] LI W S, ZHANG X Y, PENG Y D, et al. DMNet: A Network Architecture Using Dilated Convolution and Multiscale Mechanisms for Spatiotemporal Fusion of Remote Sensing Images. IEEE Sensors Journal, 2020, 20(20): 12190-12202.
[32] DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint Tri-plets for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6568-6577.
[33] WEI Z W, DUAN C Z, SONG X H, et al. AMRNet: Chips Augmentation in Aerial Images Object Detection[C/OL]. [2025-08-19]. http://arxiv.org/abs/2009.07168.
[34] 吴萌萌,张泽斌,宋尧哲,等.基于自适应特征增强的小目标检测网络.激光与光电子学进展, 2023, 60(6): 65-72.
(WU M M, ZHANG Z B, SONG Y Z, et al. Small-Target Detection Network Based on Adaptive Feature Enhancement. Laser & Optoelectronics Progress, 2023, 60(6): 65-72.)