Abstract:Existing instance-level object detection algorithms based on deep learning achieve a poor detection effect on occluded objects. To solve the problem, an improved adversarial generated region-based fully convolutional networks(AGR-FCN) with the training strategy of adversarial learning is proposed. The original fully convolutional networks(R-FCN) is regarded as a fiducial frame, and adversarial mask dropout network(AMDN) is designed based on the trained R-FCN to generate occlusion features for training samples. Through the training strategy of adversarial learning between R-FCN and AMDN, the learning ability of R-FCN to the features of occluded objects is improved, and its overall instance-level object detection performance is optimized. Experiments on GMU Kitchen dataset and BHGI dataset show that AGR-FCN algorithm achieves good detection accuracy in complex and changeable unstructured environments, such as randomly varying illumination, scale, focal ratio, angle and attitude and occlusion.
[1] SZELISKI R. Computer Vision: Algorithms and Applications. New York, USA: Springer, 2010. [2] MURUGAN A S, DEVI K S, SIVARANJANI A, et al. A Study on Various Methods Used for Video Summarization and Moving Object Detection for Video Surveillance Applications. Multimedia Tools and Applications, 2018, 77(18): 23273-23290. [3] 卢湖川,李佩霞,王 栋.目标跟踪算法综述.模式识别与人工智能, 2018, 31(1): 61-76. (LU H C, LI P X, WANG D. Visual Object Tracking: A Survey. Pattern Recognition and Artificial Intelligence, 2018, 31(1): 61-76.) [4] 张 慧,王坤峰,王飞跃.深度学习在目标视觉检测中的应用进展与展望.自动化学报, 2017, 43(8): 1289-1305. (ZHANG H, WANG K F, WANG F Y. Advances and Perspectives on Application of Deep Learning in Visual Object Detection. Acta Automatica Sinica, 2017, 43 (8): 1289-1305.) [5] RIBEIRO D, MATEUS A, MIRALDO P, et al. A Real-Time Deep Learning Pedestrian Detector for Robot Navigation // Proc of the IEEE International Conference on Autonomous Robot Systems and Competitions. Washington, USA: IEEE, 2017: 165-171. [6] PALMESE M, TRUCCO A. From 3-D Sonar Images to Augmented Reality Models for Objects Buried on the Seafloor. IEEE Transactions on Instrumentation and Measurement, 2008, 57(4): 820-828. [7] 樊 迪,HYUNWOO K,陈晓鹏,等.机器仿生眼的多任务学习人脸分析.模式识别与人工智能, 2019, 32(1): 10-16. (FAN D, HYUNWOO K, CHEN X P, et al. Multi-task Learning Based Face Analysis for Machine Bionic Eyes. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 10-16. [8] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [9] GIRSHICK R. Fast R-CNN // Proc of the IEEE Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [10] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks // LEE D D, SUGIYAMA M, LUXBURG U V, et al., eds. Advances in Neural Information Processing Systems 29. Cambridge, USA: The MIT Press, 2016: 379-387. [11] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot MultiBox Detector // Proc of the European Conference on Compu-ter Vision. Berlin, Germany: Springer, 2016: 21-37. [12] WANG R, XU J W, HAN T X. Object Instance Detection with Pruned Alexnet and Extended Training Data. Signal Processing(Image Communication), 2019, 70: 145-156. [13] DWIBEDI D, MISRA I, MARTIAL H. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1310-1319. [14] GEORGAKIS G, REZA M A, MOUSAVIAN A, et al. Multiview RGB-D Dataset for Object Instance Detection // Proc of the IEEE International Conference on 3D Vision. Washington, USA: IEEE, 2016: 426-434. [15] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-Image Translation with Conditional Adversarial Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washing-ton, USA: IEEE, 2017: 5967-5976. [16] DENTON E L, CHINTALA S, SZLAM A, et al. Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks // CORTES C, LAWRENCE N D, LEE D D, et al., eds. Advances in Neural Information Processing Systems 28. Cambridge, USA: The MIT Press, 2015: 1486-1494. [17] PATHAK D, KRÄHENBÜHL P, DONAHUE J, et al. Context Encoders: Feature Learning by Inpainting // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2536-2544. [18] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Ge-nerative Adversarial Nets[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1406.2661v1.pdf. [19] MIRZA M, OSINDERO S. Conditional Generative Adversarial Nets[C/OL]. [2019-05-25]. https://arxiv.org/pdf/1411.1784.pdf. [20] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved Techniques for Training GANs // LEE D D, SUGIYAMA M, LUXBURG U V, et al., eds. Advances in Neural Information Processing Systems 29. Cambridge, USA: The MIT Press, 2016: 2234-2242. [21] EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision, 2015, 111(1): 98-136. [22] QIN R N, WANG R. Generative Deep Deconvolutional Neural Net-work for Increasing and Diversifying Training Data // Proc of the IEEE International Conference on Imaging Systems and Techniques. Washington, USA: IEEE, 2018. DOI: 10.1109/IST.2018.8577149.