Abstract:Insufficient feature information in object detection results in low accuracy of small targets and occluded targets detection. Therefore, multi-layers context convolutional neural network (MLC-CNN) is proposed, and contextual information of multiple layers is extracted to combine local features of objects in object detection. MLC-CNN consists of region proposal network (RPN) sub-network and multi-layers context (MLC) sub-network. RPN sub-network is employed to capture feature vectors with the fixed length as object features, and MLC is employed to obtain the corresponding contextual information of the different feature maps. Finally, two kinds of information are fused. In addition, hard example training is employed to solve the problem of imbalance data. Experiments on PASCAL VOC2007 and PASCAL VOC2012 datasets indicate that mean average precision (mAP) value is improved.
[1] FANG B F, FANG L. Concise Feature Pyramid Region Proposal Network for Multi-scale Object Detection. The Journal of Supercomputing, 2018. DOI: 10.1007/s11227-018-2569-1. [2] 李庆忠,李宜兵,牛 炯.基于改进YOLO和迁移学习的水下鱼类目标实时检测.模式识别与人工智能, 2019, 32(3): 193-203. (LI Q Z, LI Y B, NIU J. Real-Time Detection of Underwater Fish Based on Improved YOLO and Transfer Learning. Pattern Recognition and Artificial Intelligence, 2019, 32(3): 193-203.) [3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [4] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [5] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [6] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2016: 379-387. [7] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [8] REDMON J, FARHADI A. YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [9] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [10] LIU L, OUYANG W L, WANG X G, et al. Deep Learning for Generic Object Detection: A Survey[J/OL]. [2019-09-23]. http://cn.arxiv.org/abs/1809.02165. [11] DIVVALA S K, HOIEM D, HAYS J H, et al. An Empirical Study of Context in Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1271-1278. [12] GALLEGUILLOS C, BELONGIE S. Context Based Object Categorization: A Critical Survey. Computer Vision and Image Understanding, 2010, 114(6): 712-722. [13] TORRALBA A. Contextual Priming for Object Detection. International Journal of Computer Vision, 2003, 53(2): 169-191. [14] CHEN X L, GUPTA A. Spatial Memory for Context Reasoning in Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4086-4096. [15] BELL S, ZITNICK C L, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883. [16] LIU Y, WANG R P, SHAN S G, et al. Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6985-6994. [17] SHRIVASTAVA A, GUPTA A. Contextual Priming and Feedback for Faster R-CNN // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 330-348. [18] LI J N, WEI Y C, LIANG X D, et al. Attentive Contexts for Object Detection. IEEE Transactions on Multimedia, 2017, 19(5): 944-954. [19] YU F, KOLTUN V. Multi-scale Context Aggregation by Dilated Convolutions[C/OL]. [2019-09-23]. https://arxiv.org/pdf/1511.07122v2.pdf. [20] SHRIVASTAVA A, GUPTA A, GIRSHICK R, et al. Training Region-Based Object Detectors with Online Hard Example Mining // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2016: 761-769. [21] ZHU Y S, ZHAO C Y, WANG J Q, et al. CoupleNet: Coupling Global Structure with Local Parts for Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 4126-4134. [22] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal Visual Object Classes(VOC) Challenge. International Journal of Computer Vision, 2010, 88(2): 303-338. [23] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [24] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2117-2125. [25] KONG T, YAO A B, CHEN Y R, et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2016: 845-853. [26] GIDARIS S, KOMODAKIS N. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1134-1142.