基于深度卷积网络的目标检测综述

doi:10.16451/j.cnki.issn1003-6059.201804005

摘要
图/表
参考文献(67)
相关文章 (12)

全文: PDF (1291 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要

在基于区域的卷积神经网络提出后,深度卷积网络开始在目标检测领域普及,更快的基于区域的卷积神经网络将整个目标检测过程合成在一个统一的深度网络框架上.随后YOLO和SSD等目标检测框架的提出进一步提升目标检测的效率.文中系统总结基于深度网络的目标检测方法,归为2类:基于候选窗口的目标检测框架和基于回归的目标检测框架.基于候选窗口的目标检测框架首先需要在输入的图像上产生很多的候选窗口,然后对这些候选窗口进行判别.这里的判别包括:对窗口包含物体的类别(包括背景)进行判断、对窗口的位置进行回归.基于回归的目标检测方法将图像目标检测看作是一个回归的过程.在此基础上,在PASCAL_VOC和COCO等主流数据库上对比目前两类目标检测框架中的主流方法,分析两类方法各自的优势.最后根据当前深度网络目标检测方法的发展趋势,对目标检测方法未来的研究热点做出合理预测.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴帅
	徐勇
	赵东宁

关键词 ：深度卷积网络, 目标检测, 候选窗口, 感兴趣区域(ROI)池化

Abstract：

Deep convolutional network is prevalent in object detection task. Region-based convolutional neural network(RCNN) bridges the gap between the classification of deep convolutional network and the object detection task well. Then the whole object detection process is aggregated into a unified deep framework by Faster-RCNN. You only look once(YOLO) and single shot multibox detector(SSD) effectively improve the efficiency of object detection. Different deep object detection frameworks are comprehensively analyzed and divided into two categories: the proposal based framework and the regression based framework. The proposal based framework is utilized to generate thousands of candidate proposals and then classification and bounding box regression are conducted on these proposals. The regression based framework outputs the bounding box position through some special iterations directly. Furthermore, the advantage for different kinds of frameworks is demonstrated through adequate experiments on the mainstream database like PASCAL_VOC and COCO. Finally, the development direction of object detection is discussed.

Key words： Deep Convolutional Network Object Detection Candidate Proposals Region of Interest(ROI) Pooling

收稿日期: 2018-01-15

ZTFLH:

TP 391.4

作者简介: 吴帅,博士研究生,主要研究方向为模式识别、深度学习.E-mail:949766996@qq.com;徐勇,博士,教授,主要研究方向为模式识别、人工智能、图像处理等.E-mail:yongxu@ymail.com;赵东宁,博士,主要研究方向为多媒体信息处理、大数据技术、人工智能.E-mail:582101@qq.com.

引用本文:

吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能, 2018, 31(4): 335-346. WU Shuai, XU Yong, ZHAO Dongning. Survey of Object Detection Based on Deep Convolutional Network. , 2018, 31(4): 335-346.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201804005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2018/V31/I4/335

[1] DOLLÁR P, APPEL R, BELONGIE S, et al. Fast Feature Pyramids for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8): 1532-1545.
[2] GALL J, LEMPITSKY V. Class-Specific Hough Forests for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 1022-1029.
[3] KALA R. Advanced Driver Assistance Systems // SURHONE L M, TENNOE M T, HENSSONOW S F, eds. On-Road Intelligent Vehicles. Cambridge, USA: Elsevier, 2016: 59-82.
[4] JAZAYERI A, CAI H Y, ZHENG J Y, et al. Vehicle Detection and Tracking in Car Video Based on Motion Model. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(2): 583-595.
[5] CARAFFI C, VOJÍRˇ T, TREFNY／ J, et al. A System for Real-Time Detection and Tracking of Vehicles from a Single Car-Mounted Ca-mera // Proc of the 15th IEEE International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2012: 975-982.
[6] WANG M, DAAMEN W, HOOGENDOORN S P, et al. Driver Assistance Systems Modeling by Model Predictive Control // Proc of the 15th IEEE International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2012: 1543-1548.
[7] CHO H, SEO Y W, KUMAR B V K V, et al. A Multi-sensor Fusion System for Moving Object Detection and Tracking in Urban Driving Environments // Proc of the IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2014: 1836-1843.
[8] LEVINSON J, ASKELAND J, BECKER J, et al. Towards Fully Autonomous Driving: Systems and Algorithms // Proc of the IEEE Intelligent Vehicles Symposium. Washington, USA: IEEE, 2011: 163-168.
[9] ZHOU X W, YANG C, YU W C. Moving Object Detection by Detecting Contiguous Outliers in the Low-Rank Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 597-610.
[10] OREIFEJ O, LI X, SHAH M. Simultaneous Video Stabilization and Moving Object Detection in Turbulence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(2): 450-462.
[11] KULCHANDANI J S, DANGARWALA K J. Moving Object Detection: Review of Recent Research Trends // Proc of the International Conference on Pervasive Computing. Washington, USA: IEEE, 2015. DOI: 10.1109/PERVASIVE.2015.7087138.
[12] CHAVEZ-GARCIA R O, AVCARD O. Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(2): 525-534.
[13] HU W C, CHEN C H, CHEN T Y, et al. Moving Object Detection and Tracking from Video Captured by Moving Camera. Journal of Visual Communication & Image Representation, 2015, 30: 164-180.
[14] WANG J D, JIANG H Z, YUAN Z J, et al. Salient Object Detection: A Discriminative Regional Feature Integration Approach // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 2083-2090.
[15] SZEGEDY C, REED S, ERHAN D, et al. Scalable, High-Quality Object Detection[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1412.1441v2.pdf.
[16] SZEGEDY C, TOSHEV A, ERHAN D. Deep Neural Networks for Object Detection // BURGES C J C, BOTTOU L, WELLING M, et al., eds. Advances in Neural Information Processing Systems 26. Cambridge, USA: The MIT Press, 2013: 2553-2561.
[17] WANG X Y, HAN T X, YAN S C. An HOG-LBP Human Detector with Partial Occlusion Handling // Proc of the 12th IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2009: 32-39.
[18] DALAL N, TRIGGS B. Histograms of Oriented Gradients for Human Detection // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2005: 886-893.
[19] ZHU Q, YEH M C, CHENG K T, et al. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2006: 1491-1498.
[20] LOWE D G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.
[21] LUO J, GWUN O. A Comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing, 2013, 3(4): 143-152.
[22] LIU C, YUEN J, TORRALBA A. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 978-994.
[23] DIVVALA S K, EFROS A A, HEBERT M. How Important Are “Deformable Parts” in the Deformable Parts Model? // Proc of the 12th European Conference on Computer Vision. London, UK: Springer-Verlag, 2012, III: 31-40.
[24] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object Detection with Discriminatively Trained Part Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
[25] GIRSHICK R, IANDOLA F, BARRELL T, et al. Deformable Part Models Are Convolutional Neural Networks[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1703.06211.pdf.
[26] OUYANG W L, WANG X G. Joint Deep Learning for Pedestrian Detection // Proc of the IEEE International Conference on Compu-ter Vision. Washington, USA: IEEE, 2013: 2056-2063.
[27] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective Search for Object Recognition. International Journal of Computer Vision, 2013, 104(2): 154-171.
[28] ZHU G, PORIKLI F, LI H D. Tracking Randomly Moving Objects on Edge Box Proposals[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1507.08085.pdf.
[29] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255.
[30] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft Coco: Common Objects in Context // Proc of the 13th European Conference on Computer Vision. New York, USA: Springer, 2014: 740-755.
[31] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Cla-ssification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90.
[32] SZEGEDY C, ERHAN D, TOSHEV A T. Object Detection Using Deep Neural Networks[P/OL]. [2017-12-10]. http://www.freepatentsonline.com/9275308.pdf.
[33] LONG J, SHELHAMER E, DARRELL T. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
[34] HANSEN L K, SALAMON P. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001.
[35] KINGSBURY B, SAINATH T N, SOLTAU H. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization // Proc of the 13th Annual Conference of International Speech Communication Association. New York, USA: ACM, 2012: 10-13.
[36] XU L, REN J S J, LIU C, et al. Deep Convolutional Neural Network for Image Deconvolution // GHAHRAMANI Z, WELLING M, CORTES C, et al., eds. Advances in Neural Information Processing Systems 27. Cambridge, USA: The MIT Press, 2014: 1790-1798.
[37] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J/OL]. [2017-12-10]. https://arxiv.org/pdf/1409.1556.pdf.
[38] SZEGEDY C, LIU W, JIA Y Q, et al. Going Deeper with Convolutions[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1409.4842.pdf.
[39] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[40] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2013: 580-587.
[41] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[42] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448.
[43] REN S Q, HE K M, Girshick R B, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[44] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks // LEE D D, SUGIYAMA M, LUXBURG U V, et al., eds. Advances in Neural Information Processing Systems 29. Cambridge, USA: The MIT Press, 2016: 379-387.
[45] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788.
[46] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the 14th European Conference on Computer Vision. New York, USA: Springer, 2016, I: 21-37.
[47] ZEILER M D, FERGUS R. Visualizing and Understanding Convolutional Neural Networks // Proc of the 13th European Conference on Computer Vision. New York: USA: Springer, 2014, I: 818-833.
[48] LIN M, CHEN Q, YAN S C. Network in Network[J/OL]. [2017-12-10]. https://arxiv.org/pdf/1312.4400.pdf.
[49] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the Inception Architecture for Computer Vision // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2818-2826.
[50] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inceptionv4, Inception-Resnet and the Impact of Residual Connections on Learning[J/OL]. [2017-12-10]. https://arxiv.org/pdf/1602.07261.pdf.
[51] XIE S N, GIRSHICK R B, DOLLÁR P, et al. Aggregated Residual Transformations for Deep Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5987-5995.
[52] ZAGORUYKO S, KOMODAKIS N. Wide Residual Networks[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1605.07146.pdf.
[53] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269.
[54] CHEN Y P, LI J N, GE H X, et al. Dual Path Networks // GUYON I, LUXBURG U V, BENGION S, et al., eds. Advances in Neural Information Processing Systems 30. Cambridge, USA: The MIT Press, 2017: 4470-4478.
[55] HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1709.01507.pdf.
[56] KONG T, YAO A B, CHEN Y R, et al. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2016: 845-853.
[57] BELL S, ZITNICK L C, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883.
[58] WANG X L, SHRIVASTAVA A, GUPTA A. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3039-3048.
[59] JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial Transformer Networks // CORTES C, LAWRENCE N D, LEEE D O, et al., eds. Advances in Neural Information Processing Systems 28. Cambridge, USA: The MIT Press, 2015: 2017-2025.
[60] LIN C H, LUCEY S. Inverse Compositional Spatial Transformer Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2252-2260.
[61] S0ØNDERBY S K, S0ØNDERBY C K, MAAL0ØE L, et al. Recurrent Spatial Transformer Networks[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1509.05329.pdf.
[62] OUYANG W L, WANG X G, ZENG X Y, et al. Deepid-Net: Deformable Deep Convolutional Neural Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 2403-2412.
[63] DAI J F, QI H Z, XIONG Y W, et al. Deformable Convolutional Networks // Proc of the IEEE International Conference on Compu-ter Vision. Washington, USA: IEEE, 2017: 764-773.
[64] YOO D, PARK S, LEE J Y, et al. Attentionnet: Aggregating Weak Directions for Accurate Object Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 2659-2667.
[65] NAJIBI M, RASTEGARI M, DAVIS L S. G-CNN: An Iterative Grid Based Object Detector // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2369-2377.
[66] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional Single Shot Detector[C/OL]. [2017-12-10]. https://arxiv.org/pdf/1701.06659.pdf.
[67] KONG T, SUN F C, YAO A B, et al. RON: Reverse Connection
with Objectness Prior Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 5244-5252.