基于双重并行任务的无人机小目标两阶段检测方法

doi:10.16451/j.cnki.issn1003-6059.202601002

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (16590 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要目标在图像中的尺寸过小是无人机目标检测面临的主要挑战之一,特别是当无人机飞行高度较高且成像分辨率较低时,小目标特征极易在深度神经网络的深层特征中弥散.为此,文中提出基于双重并行任务的无人机小目标两阶段检测方法,并行任务包含小目标检测任务与超分辨率重建任务.在超分辨率重建任务分支中,构建空间先验模块和窗口注意力引导模块.小目标检测任务分支以Swin Transformer为基础,并且分别由空间先验模块和窗口注意力引导模块进行浅层特征的空间信息和深层特征的注意力的超分辨率重建.两阶段检测方法分为训练阶段和推理阶段.在训练阶段,超分辨率重建任务分支均以高分辨率特征为标签,从而增强小目标检测任务分支对细节特征的提取能力.在推理阶段,仅保留小目标检测任务分支,可提升方法的推理速度,降低资源开销.在公共数据集VisDrone和自制无人机数据集JZ-UAV上的实验表明,文中方法识别精度较高.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杨艺
	朱江睿
	王科平
	张高鹏
	钱伟
	王田

关键词 ：无人机(UAV), Swin Transformer, 小目标检测, 超分辨率重建

Abstract：The excessively small size of targets in images is a major challenge for drone-based object detection. Particularly when drones operate at high altitudes with low imaging resolution, features of small targets are prone to dissipation within the deep layers of deep neural networks. To address this issue, a method of dual-branch two-stage detection for small objects in unmanned aerial vehicle(DB-TS) is proposed. The parallel tasks consist of a small object detection task and a super-resolution reconstruction task. In the super-resolution reconstruction branch, a spatial prior module(SPM) and a window attention module(WAM) are constructed. The small object detection branch is built upon the Swin Transformer backbone. The spatial information from shallow features and the attention guidance from deep features are reconstructed via super-resolution methods of SPM and WAM, respectively. The two-stage detection framework consists of a training phase and an inference phase. During the training phase, the fine-grained detail extraction capability of the small object detection branch is strengthened by using the high-resolution features as ground truth in the super-resolution reconstruction branch. During the inference phase, inference speed is significantly improved and computational resource consumption is reduced by retaining only the small object detection branch.Experiments on VisDrone and JZ-UAV datasets demonstrate that the proposed method achieves higher recognition accuracy compared to baseline models and exhibits superior performance among compared state-of-the-art methods.

Key words： Unmanned Aerial Vehicle(UAV) Swin Transformer Small Object Detection Super-Re- solution Reconstruction

收稿日期: 2025-11-27

ZTFLH:

TP391.4

基金资助:国家自然科学基金项目(No.92467108)资助

通讯作者: 王科平,博士,副教授,主要研究方向为图像清晰化处理、目标检测、深度学习等.E-mail:wangkp@hpu.edu.cn.

作者简介: 杨艺,博士,副教授,主要研究方向为深度学习、强化学习、智能控制等.E-mail:yangyi@hpu.edu.cn.
朱江睿,硕士研究生,主要研究方向为图像识别、目标检测、深度学习.E-mail:zhujiangrui@home.hpu.edu.cn.
张高鹏,博士,副教授,主要研究方向信号处理,航天器成像、目标感知等.E-mail:zhanggaopeng@opt.ac.cn.
钱伟,博士,教授,主要研究方向为时滞系统、随机系统、网络控制系统等.E-mail:qwei@hpu.edu.cn.
王田,博士,教授,主要研究方向为人工智能、计算机视觉、模式识别.E-mail:wangtian@buaa.edu.cn.

引用本文:

杨艺, 朱江睿, 王科平, 张高鹏, 钱伟, 王田. 基于双重并行任务的无人机小目标两阶段检测方法[J]. 模式识别与人工智能, 2026, 39(1): 31-51. YANG Yi, ZHU Jiangrui, WANG Keping, ZHANG Gaopeng, QIAN Wei, WANG Tian. Dual-Branch Two-Stage Detection for Small Objects in UAV Images. Pattern Recognition and Artificial Intelligence, 2026, 39(1): 31-51.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202601002 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2026/V39/I1/31

[1] LIANG W D, TAN J T, HE H J, et al. Detection of Small Objects from UAV Imagery via an Improved Swin Transformer // Proc of the IEEE International Geoscience and Remote Sensing Symposium. Washington, USA: IEEE, 2024: 9134-9138.
[2] 王洪群,彭嘉雄,李玲玲.基于视觉的无人机着陆时机场标记的检测与识别.模式识别与人工智能, 2006, 19(6): 764-770.
(WANG H Q, PENG J X, LI L L.Airport Runway Marking Detection and Identification of Unmanned Landing Vehicle Based on Vision. Pattern Recognition and Artificial Intelligence, 2006, 19(6): 764-770.)
[3] CUI L S, MA R, LÜ P W, et al. MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects. Science China(Information Sciences), 2020, 63(2). DOI: 10.1007/s11432-019-2723-1.
[4] BELL S, ZITNICK C L, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883.
[5] WANG G T, XIONG Z W, LIU D, et al. Cascade Mask Generation Framework for Fast Small Object Detection // Proc of the IEEE International Conference on Multimedia and Expo. Washington, USA: IEEE, 2018. DOI: 10.1109/ICME.2018.8486561.
[6] BAI Y C, ZHANG Y Q, DING M L, et al. SOD-MTGAN: Small Object Detection via Multi-task Generative Adversarial Network // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 210-226.
[7] CHEN J Q, CHEN K Y, CHEN H, et al. A Degraded Reconstruction Enhancement-Based Method for Tiny Ship Detection in Remote Sensing Images with a New Large-Scale Dataset. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60. DOI: 10.1109/TGRS.2022.3180894.
[8] NOH J, BAE W, LEE W, et al. Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 9724-9733.
[9] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002.
[10] O'SHEA K, NASH R. An Introduction to Convolutional Neural Net-works[C/OL].[2025-11-21]. https://arxiv.org/abs/1511.08458.
[11] LIN T, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944.
[12] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788.
[13] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[14] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-Box Detector // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
[15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2025-11-21]. https://arxiv.org/pdf/2010.11929v2.
[16] VASWANI A, SHAZEER N, PARMAR N, et al.Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
[17] SHERMEYER J, VAN ETTEN A.The Effects of Super-Resolution on Object Detection Performance in Satellite Imagery // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2019: 1432-1441.
[18] CAO Y, XU J R, LIN S, et al. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond // Proc of the IEEE/CVF International Conference on Computer Vision Workshop. Washington, USA: IEEE, 2019: 1971-1980.
[19] TAN M X, PANG R M, LE Q V.EfficientDet: Scalable and Efficient Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10778-10787.
[20] ZHANG H K, CHANG H, MA B P, et al. Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training // Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 260-275.
[21] DONG C, LOY C C, HE K M, et al. Image Super-Resolution Using Deep Convolutional Networks[C/OL].[2025-11-18]. https://arxiv.org/pdf/1501.00092.
[22] WANG X T, YU K, WU S X, et al. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 63-75.
[23] ZHANG Y L, LI K P, LI K, et al. Image Super-Resolution Using Very Deep Residual Channel Attention Networks // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 294-310.
[24] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al.Gene-rative Adversarial Networks // Proc of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014, II: 2672-2680.
[25] LI J N, LIANG X D, WEI Y C, et al. Perceptual Generative Adversarial Networks for Small Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 1951-1959.
[26] KOESTER E, ŞAHIN C Ş.A Comparison of Super-Resolution and Nearest Neighbors Interpolation Applied to Object Detection on Sa-tellite Data[C/OL]. [2025-11-22].https://arxiv.org/pdf/1907.05283.
[27] HARIS M, SHAKHNAROVICH G, UKITA N.Task-Driven Super Resolution: Object Detection in Low-Resolution Images // Proc of the 28th International Conference on Neural Information Process. Berlin, Germany: Springer, 2021: 387-395.
[28] MUSUNURI Y R, KWON O, KUNG S.SRODNet: Object Detection Network Based on Super Resolution for Autonomous Vehicles. Remote Sensing, 2022, 14(24). DOI: 10.3390/rs14246270.
[29] LIU F, CHEN R W, ZHANG J Y, et al. ESRTMDet: An End-to-End Super-Resolution Enhanced Real-Time Rotated Object Detector for Degraded Aerial Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 4983-4998.
[30] GAO Y X, WANG Y C, ZHANG Y X, et al. Feature Super-Resolution Fusion with Cross-Scale Distillation for Small-Object Detection in Optical Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 2024, 21. DOI: 10.1109/LGRS.2024.3372500.
[31] ZHANG J Q, LEI J, XIE W Y, et al. SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Ima-gery. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61. DOI: 10.1109/TGRS.2023.3258666.
[32] ZHANG H P, WEN S Z, WEI Z X, et al. High-Resolution Feature Generator for Small-Ship Detection in Optical Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62. DOI: 10.1109/TGRS.2024.3377999.
[33] VARGHESE R, SAMBATH M.YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness // Proc of the International Conference on Advances in Data Engineering and Intelligent Computing Systems. Washington, USA: IEEE, 2024. DOI: 10.1109/ADICS58448.2024.10533619.
[34] LIU S L, ZENG Z Y, REN T H, et al. Grounding DINO: Ma-rrying DINO with Grounded Pre-training for Open-Set Object Detection // Proc of the 18th European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 38-55.
[35] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results // Proc of the IEEE/CVF International Conference on Computer Vision Workshop. Washington, USA: IEEE, 2019: 213-226.
[36] WOO S, DEBNATH S, HU R H, et al. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 16133-16142.
[37] LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the 2020s // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 11966-11976.
[38] CHEN X Q, YANG C Z, MO J S, et al. CSPNeXt: A New Efficient Token Hybrid Backbone. Engineering Applications of Artificial Intelligence, 2024, 132. DOI: 10.1016/j.engappai.2024.107886.
[39] WANG C, LIAO H M, WU Y, et al. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2020: 1571-1580.
[40] ZHANG H, LI F, LIU S L, et al. DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection[C/OL].[2025-11-22]. https://arxiv.org/pdf/2203.03605.
[41] LÜ C Q, ZHANG W A, HUANG H, et al. RTMDet: An Empirical Study of Designing Real-Time Object Detectors[C/OL].[2025-11-22]. https://arxiv.org/pdf/2212.07784.
[42] CHEN S F, SUN P Z, SONG Y B, et al. DiffusionDet: Diffusion Model for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 19773-19786.