Region Proposal Network Based on Effective Receptive Field
ZHANG Shengyu1,2, DONG Shifeng2, JIAO Lin2, WANG Qijin2, WANG Hongqiang2
1. Institutes of Physical Science and Information Technology, Anhui University, Hefei 230039; 2. Special Robot Laboratory, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031
Abstract:Object detection methods based on convolutional neural network(CNN) optimize region proposal to achieve a higher detection accuracy. Therefore, an effective receptive field(eRF) based region proposal network is proposed. A sample matching method based on eRF is introduced into regional proposal network. Thus, the intersection over union(IoU) based sample matching rule is improved. The representation ability of feature information in the region proposal generation stage is enhanced. The number of region proposal and anchor boxes is greatly reduced. The parameter settings of anchor boxes are also simplified. The detection accuracy on Pascal VOC datasets is improved in combination with Fast R-CNN detector. The effectiveness of proposed method is verified.
[1] 周晓彦,王 珂,李凌燕.基于深度学习的目标检测算法综述.电子测量技术, 2017, 40(11): 89-93. (ZHOU X Y, WANG K, LI L Y. Review of Object Detection Based on Deep Learning. Electronic Measurement Technology, 2017, 40(11): 89-93.) [2] 奚雪峰,周国栋.面向自然语言处理的深度学习研究.自动化学报, 2016, 42(10): 1445-1465. (XI X F, ZHOU G D. A Survey on Deep Learning for Natural Language Processing. Acta Automatica Sinica, 2016, 42(10): 1445-1465.) [3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C/OL]. [2019-07-22]. http://is.ulsan.ac.kr/files/announcement/513/r-cnn-cvpr.pdf. [4] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [5] REN S Q, HE M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [6] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks // Proc of the 29th Confe-rence on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2016: 379-387. [7] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE,2017: 936-944. [8] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. [9] REDMON J, DIVVALA S, GIRSHICK R. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Mul-tiBox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [11] KONG T, SUN F C, YAO A B, et al. RON: Reverse Connection with Objectness Prior Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5244-5252. [12] 徐晓涛,孙亚东,章 军.基于YOLO框架的血细胞自动计数研究. 计算机工程与应用[J/OL]. [2019-07-22]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190722.0909.004.html. (XU X T, SUN Y D, ZHANG J. Automated Counting of Blood Cells Based on YOLO Framework. Computer Engineering and Applications[J/OL]. [2019-07-22]. http://kns.cnki.net/kcms/detail/11.2127.TP.20190722.0909.004.html.) [13] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The PASCAL Visual Object Cla-sses Challenge: A Retrospective. International Journal of Computer Vision, 2015, 111(1): 98-136. [14] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common Objects in Context // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755. [15] KISANTAL M, WOJNA Z, MURAWSKI J, et al. Augmentation for Small Object Detection[C/OL]. [2019-07-22]. https://arxiv.org/pdf/1902.07296.pdf. [16] SCHARGUS V, SCHARGUS M. Sunflower Shadow in the Central Field of Vision. Acta Ophthalmologica, 2018, 96(6): e750-e751. [17] LIU S T, HUANG D, WANG Y H. Receptive Field Block Net for Accurate and Fast Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 404-419. [18] LIU Y G, YU J Z, HAN Y H. Understanding the Effective Receptive Field in Semantic Image Segmentation. Multimedia Tools and Applications, 2018, 77(17): 22159-22171. [19] LUO W J, LI Y J, URRTASUN R, et al. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks // LEE D D, SUGIYAMA M, LUXBURG U V, et al., eds. Advances in Neural Information Processing Systems 29. Cambridge, USA: The MIT Press, 2016: 4898-4906. [20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet Cla-ssification with Deep Convolutional Neural Networks // PEREIRA F, BURGES C J C, BOTTOU L, et al., eds. Advances in Neural Information Processing Systems 25. Cambridge, USA: The MIT Press, 2012: 1097-1105. [21] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [22] NEWELL A, YANG K Y, DENG J. Stacked Hourglass Networks for Human Pose Estimation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 483-499.