Feature Pyramid Object Detection Network Based on Function Maintenance
XU Chengqi1,2,3, HONG Xuehai1,4
1. Strategy Research Center of Information Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190 2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049 3. R&D Center, Institute of Big Data, Cloud Computing Center of the Chinese Academy of Sciences, Shangrao 334000 4. Center of Information Development Strategy and Evaluation, Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190
Abstract:To solve the problem of feature pyramid network in multi-scale and small object detection, a feature pyramid object detection network based on function maintenance is proposed. Firstly, feature maps are selected in the backbone convolutional architecture to build feature pyramid. For these feature maps of different scales, feature fusion with low loss is carried out from top to bottom using function maintenance fusion module. The strong high-level semantic information is maintained more effectively, and the representation ability for small object of low-level feature maps is greatly enhanced. The detection precision is improved by two-stage features of the proposed network to describe the objects. Finally, context information is fully utilized to further enhance the ability to distinguish multi-scale object. Experiments on PASCAL VOC public dataset show that the detection result of the proposed network is satisfactory. Moreover, the proposed network achieves better results in the case of object occlusion and blur as well.
[1] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [2] REDMON J, FARHADI A. YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [3] REDMON J, FARHADI A. Yolov3: An Incremental Improvement[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1804.02767.pdf. [4] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [5] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. [6] ZHANG S F, WEN L Y, BIAN X, et al. Single-Shot Refinement Neural Network for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4203-4212. [7] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [8] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 346-361. [9] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [10] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1506.01497.pdf. [11] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1605.06409.pdf. [12] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [13] TIAN Z, SHEN C H, CHEN H, et al. FCOS: Fully Convolutional One-Stage Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1904.01355.pdf. [14] ZHU C C, HE Y H, SAVVIDES M. Feature Selective Anchor-Free Module for Single-Shot Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1903.00621.pdf. [15] CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into High Quality Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6154-6162. [16] GHIASI G, LIN T Y, LE Q V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7029-7038. [17] CHEN Y T, HAN C X, WANG N Y, et al. Revisiting Feature Alignment for One-Stage Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1908.01570.pdf. [18] PENG J R, SUN M, ZHANG Z X, et al. Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1909.02293.pdf. [19] LI Y H, CHEN Y T, WANG N Y, et al. Scale-Aware Trident Networks for Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1901.01892.pdf. [20] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [21] MA W C, WU Y W, WANG Z B, et al. MDCN: Multi-scale, Deep Inception Convolutional Neural Networks for Efficient Object Detection // Proc of the 24th International Conference on Pattern Recognition. Washington, USA: IEEE, 2018: 2510-2515. [22] SINGH B, DAVIS L S. An Analysis of Scale Invariance in Object Detection-SNIP // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3578-3587. [23] SINGH B, NAJIBI M, DAVIS L S. SNIPER: Efficient Multi-scale Training[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1805.09300.pdf. [24] KONG T, YAO A B, CHEN Y R, et al. Hypernet: Towards Accurate Region Proposal Generation and Joint Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 845-853. [25] BELL S, ZITNICK C L, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883. [26] CAI Z W, FAN Q F, FERIS R S, et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 354-370. [27] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [28] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [29] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training Region-Based Object Detectors with Online Hard Example Mining // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 761-769. [30] KONG T, SUN F C, YAO A B, et al. RON: Reverse Connection with Objectness Prior Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5936-5944. [31] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional Single Shot Detector[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1701.06659.pdf. [32] SHEN Z Q, LIU Z, LI J G, et al. DSOD: Learning Deeply Supervised Object Detectors from Scratch // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1937-1945. [33] YOO J, SEO G, CHUNG I, et al. Mixture-Model-Based Bounding Box Density Estimation for Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1911.12721.pdf. [34] LIM J S, ASTRID M, YOON H J, et al. Small Object Detection Using Context and Attention[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1912.06319.pdf. [35] MA W C, WU Y W, CEN F, et al. MDFN: Multi-scale Deep Feature Learning Network for Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1912.04514.pdf. [36] GIDARIS S, KOMODAKIS N. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1134-1142. [37] ZHOU W, LI Y Y. Feature Fusion Detector for Semantic Cognition of Remote Sensing[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1909.13047.pdf. [38] YOO J H, KUM D, CHOI J W. ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection[C/OL]. [2020-01-03]. https://arxiv.org/pdf/1908.00328.pdf.