|
|
Depth-Reshaping Based Aerial Object Detection Enhanced Network |
FU Tianyi1,2, YANG Benyi3,4, DONG Hongbin1,2, DENG Baosong3,4 |
1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001; 2. National Engineering Laboratory for Modeling and Emulation in E-Government, Harbin Engineering University, Harbin 150001; 3. Defense Innovation Institute(DII), Academy of Military Science, Beijing 100071; 4. Intelligent Game and Decision Laboratory, Academy of Military Science, Beijing 100071 |
|
|
Abstract To address the issues of complex background interference, loss of fine details in small objects and the high demand for detection efficiency in aerial image object detection, a depth-reshaping enhanced network(DR-ENet) is proposed. Firstly, the traditional downsampling methods are replaced by spatial depth-reshaping techniques to reduce information loss during feature extraction and enhance the ability of the network to capture details. Then, a deformable spatial pyramid pooling method is designed to enhance the adaptability of network to object shape variations and its ability to recognize in complex backgrounds. Simultaneously, an attention decoupling detection head is proposed to enhance the learning effectiveness for different detection tasks. Finally, a small-scale aerial dataset , PORT, is constructed to simultaneously consider the characteristics of dense small objects and complex backgrounds. Experiments on three public aerial datasets and PORT dataset demonstrate that DR-ENet achieves performance improvement, proving its effectiveness and high efficiency in aerial image object detection.
|
Received: 19 April 2024
|
|
Fund:Supported by National Natural Science Foundation of China(No.61472095,62303486,42201501,61902423), Natural Science Foundation of Heilongjiang Province(No.KY10600200048) |
Corresponding Authors:
DONG Hongbin, Ph.D., professor. His research interests include artificial intelligence and multi-agent systems.
|
About author:: FU Tianyi, Ph.D. candidate. Her research interests include deep learning and computer vision. YANG Benyi, Ph.D., assistant professor. Her research interests include computer vision. DENG Baosong, Ph.D., professor. His research interests include unmanned systems technology and applications. |
|
|
|
[1] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional Single Shot Detector[C/OL].[2024-04-23]. https://arxiv.org/pdf/1701.06659.pdf. [2] ZHANG S F, WEN L Y, BIAN X, et al. Single-Shot Refinement Neural Network for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4203-4212. [3] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2997-3007. [4] LI Z X, YANG L, ZHOU F Q. FSSD: Feature Fusion Single Shot Multibox Detector[C/OL]. [2024-04-23].https://arxiv.org/pdf/1712.00960.pdf. [5] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2016: 779-788. [6] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multi-box Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [8] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2980-2988. [9] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. [10] GIRSHICK R. Fast R-CNN // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [11] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [12] DONG Z W, LI G X, LIAO Y, et al. CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10516-10525. [13] ZAND M, ETEMAD A, GREENSPAN M. ObjectBox: From Centers to Boxes for Anchor-Free Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 390-406. [14] TIAN Z, SHEN C H, CHEN H, et al. FCOS: A Simple and Strong Anchor-Free Object Detector // Proc of the the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 9626-9635. [15] LAW H, DENG J. CornerNet: Detecting Objects as Paired Key-points. International Journal of Computer Vision, 2020, 128: 642-656. [16] LIU W, HASAN I, LIAO S C. Center and Scale Prediction: Anchor-Free Approach for Pedestrian and Face Detection. Pattern Recognition, 2023, 135. DOI: 10.1016/j.patcog.2022.109071. [17] WANG Z Y, LIU Z W, XU G, et al. Object Detection in UAV Aerial Images Based on Improved YOLOv7-Tiny // Proc of the 4th International Conference on Computer Vision, Image and Deep Learning. Washington, USA: IEEE, 2023: 370-374. [18] PANDEY V, ANAND K, KALRA A,et al. Enhancing Object Detection in Aerial Images. Mathematical Biosciences and Enginee-ring, 2022, 19(8): 7920-7932. [19] YU Z L, WU Y X, WEI B Q, et al. A Lightweight and Efficient Model for Surface Tiny Defect Detection. Applied Intelligence, 2023, 53(6): 6344-6353. [20] 董刚,谢维成,黄小龙,等.深度学习小目标检测算法综述.计算机工程与应用. 2023, 59(11): 16-27. (DONG G, XIE W C, HUANG X L, et al. Review of Small Object Detection Algorithms Based on Deep Learning. Computer Engineering and Applications, 2023, 59(11): 16-27.) [21] LIU T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [22] GUO C X, FAN B, ZHANG Q, et al. AugFPN: Improving Multi-scale Feature Learning for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 12592-12601. [23] WANG G Q, ZHANG Y, CHEN H, et al. FSoD-Net: Full-Scale Object Detection from Optical Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60. DOI: 10.1109/TGRS.2021.3064599. [24] BELL S, ZITNICK C L, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883. [25] PANG J M, LI C, SHI J P, et al. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(8): 5512-5524. [26] HU H, GU J Y, ZHANG Z, et al. Relation Networks for Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3588-3597. [27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press,2017: 6000-6010. [28] WANG R J, DONG S F, JIAO L, et al. OSAF-Net: A One-Stage Anchor-Free Detector for Small-Target Crop Pest Detection. Applied Intelligence, 2023, 53(21): 24895-24907. [29] SUNKARA R, LUO T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects // Proc of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2022: 443-459. [30] GUO M H, LU C Z, HOU Q B. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 1140-1150. [31] 蔡逢煌,张家翔,黄捷.基于图像低维特征融合的航拍小目标检测模型.模式识别与人工智能, 2024, 37(2): 162-171. (CAI F H, ZHANG J X, HUANG J. Model for Small Object Detection in Aerial Photography Based on Low Dimensional Image Feature Fusion. Pattern Recognition and Artificial Intelligence, 2024, 37(2): 162-171.) [32] GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLOSeries in 2021[C/OL]. [2024-04-23]. https://arxiv.org/pdf/2107.08430.pdf. [33] ZHU X K, LYU S, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 2778-2788. [34] WANG J W, YANG W, GUO H W, et al. Tiny Object Detection in Aerial Images // Proc of the 25th International Conference on Pattern Recognition. Washington, USA: IEEE, 2020: 3791-3798. [35] DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 213-226. [36] SUN W, DAI L, ZHANG X R, et al. RSOD: Real-Time Small Object Detection Algorithm in UAV-Based Traffic Monitoring. Applied Intelligence, 2022, 52(8): 8448-8463. [37] CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into High Quality Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6154-6162. [38] WENG K H, CHU X X, XU X M, et al. EfficientRep: An Efficient RepVGG-Style Convnets with Hardware-Aware Neural Network Design[C/OL].[2024-04-23]. https://arxiv.org/pdf/2302.00386.pdf. [39] LI Y X, HOU Q B, ZHENG Z H, et al. Large Selective Kernel Network for Remote Sensing Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 16748-16759. [40] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2021: 10208-10219. [41] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9756-9765. [42] WANG A, CHEN H, LIN Z J, et al. RepViT: Revisiting Mobile CNN from ViT Perspective[C/OL].[2024-04-23]. https://arxiv.org/pdf/2307.09283.pdf. |
|
|
|