Occluded Pedestrian Detection Algorithm Based on Improved Network Structure of YOLOv3
LIU Li1,2, ZHENG Yang1,2,3, FU Dongmei1,2
1. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083 2. Beijing Engineering Research Center of Industrial Spectrum Imaging, University of Science and Technology Beijing, Beijing 100083 3. Shunde Graduate School, University of Science and Technology Beijing, Foshan 528399
Abstract:Aiming at high missed detection rates of YOLOv3 for occluded pedestrian in surveillance video, a detection method for occluded pedestrian based on improved network structure of YOLOv3 is proposed. Firstly, the spatial pyramid pooling network is introduced into the fully connected layer to enhance the multi-scale feature fusion capability of the network. Secondly, the network structure pruning is employed to eliminate the network structure redundancy to avoid network degeneration and overfitting problem caused by the deepening of network layers and reduce the amount of parameters. Multi-scale training is performed on the corridor pedestrian dataset to obtain the best weight model. Experimental results indicate the improvement of average accuracy and detection speed of the proposed algorithm.
[1] 吴 群,王 田,王汉武,等.现代智能视频监控研究综述.计算机应用研究, 2016, 33(6): 1601-1606. (WU Q, WANG T, WANG H W, et al. Survey on Modern Intelligent Video Surveillance. Application Research of Computers, 2016, 33(6): 1601-1606.) [2] 张雅俊,高陈强,李 佩,等.基于卷积神经网络的人流量统计.重庆邮电大学学报(自然科学版), 2017, 29(2): 265-271. (ZHANG Y J, GAO C Q, LI P, et al. Pedestrian Counting Based on Convolutional Neural Network. Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2017, 29(2): 265-271.) [3] 屈晶晶,辛云宏.连续帧间差分与背景差分相融合的运动目标检测方法.光子学报, 2014, 43(7): 213-220. (QU J J, XIN Y H. Combined Continuous Frame Difference with Background Difference Method for Moving Object Detection. Acta Photonica Sinica, 2014, 43(7): 213-220.) [4] YADAV R P, SENTHAMILARASU V, KUTTY K, et al. Implementation of Robust HOG-SVM Based Pedestrian Classification. International Journal of Computer Applications, 2015, 114(19): 10-16. [5] 郭 烈,王荣本,张明恒,等.基于Adaboost算法的行人检测方法.计算机工程, 2008, 34(3): 202-204. (GUO L, WANG R B, ZHANG M H, et al. Pedestrian Detection Method Based on Adaboost Algorithm. Computer Engineering, 2008, 34(3): 202-204.) [6] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [7] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [8] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [9] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective Search for Object Recognition. International Journal of Computer Vision, 2013, 104(2): 154-171. [10] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [11] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016. 779-788. [12] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. [13] REDMON J, FARHADI A. YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [14] REDMON J, FARHADI A. YOLOv3: An Incremental Improvement[C/OL]. [2020-03-01]. https://arxiv.org/pdf/1804.02767.pdf. [15] ZHANG P Y, ZHONG Y X, LI X Q. SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications[C/OL]. [2020-03-01]. https://arxiv.org/ftp/arxiv/papers/1907/1907.11093.pdf. [16] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regre-ssion // Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2019. DOI: 10.1109/CVPR.2019.00075. [17] BRAUN M, KREBS S, FLOHR F, et al. EuroCity Persons: A Novel Benchmark for Person Detection in Traffic Scenes. IEEE Tran-sactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1844-1861. [18] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. [19] LIU Z, LI J G, SHEN Z Q, et al. Learning Efficient Convolutional Networks through Network Slimming // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2755-2763. [20] CUBUK E D, ZOPH B, MANE D, et al. Auto Augment: Learning Augmentation Policies from Data[C/OL]. [2020-03-01]. https://arxiv.org/pdf/1805.09501.pdf.