Abstract:The exclusive usage of sequential convolution operation in the deep networks results in the lack of the target detailed information of feature layers and global characteristics. The detection performance for small objects and the detection accuracy are reduced. In this paper, a deep networks detection algorithm fusing multiple dilated convolution(MDC) operator and multi-level characteristics is proposed based on the residual network structure. The convolution kernel is composed of 5 different receptive fields and 8 different semantic feature maps can be generated. The MDC operator is introduced into the feature extraction block to build a new feature layer. The transposition convolution is employed to increase the dimension of the detection layer and make a collage of multi-level feature layers. Thus, the original features of the targets can be retained in the newly generated detection layer to the most extent. Finally, the detection model is constructed by the non-maximal suppression. The experimental results show that the proposed model with the multi-leveled features and MDC operator can effectively improve the mean average precision and detection performance for small targets.
[1] 张雨丰,郑忠龙,刘华文,等.基于特征图切分的轻量级卷积神经网络.模式识别与人工智能, 2019, 32(3): 237-246. (ZHANG Y F, ZHENG Z L, LIU H W, et al. A Lightweight Con-volutional Neural Network Architecture with Slice Feature Map. Pa-ttern Recognition and Artificial Intelligence, 2019, 32(3): 237-246.) [2] 李庆忠,李宜兵,牛炯.基于改进YOLO和迁移学习的水下鱼类目标实时检测.模式识别与人工智能, 2019, 32(3): 193-203. (LI Q Z, LI Y B, NIU J. Real-Time Detection of Underwater Fish Based on Improved YOLO and Transfer Learning. Pattern Recognition and Artificial Intelligence, 2019, 32(3): 193-203.) [3] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [4] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective Search for Object Recognition. International Journal of Computer Vision, 2013, 104: 154-171. [5] 胡正平,何薇,王蒙,等.多层次深度网络融合人脸识别算法.模式识别与人工智能, 2017, 30(5): 448-455. (HU Z P, HE W, WANG M, et al. Multi-level Deep Network Fused for Face Recognition. Pattern Recognition and Artificial Inte-lligence, 2017, 30(5): 448-455.) [6] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [7] REDMON J, FARHADI A. YOLO9000: Better, Faster, Stronger // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6517-6525. [8] REDMON J, FARHADI A. YOLOv3: An Incremental Improvement[C/OL]. [2020-06-12].https://arxiv.org/pdf/1804.02767.pdf. [9] CAI Z W, FAN Q F, FERIS R S, et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 354-370. [10] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2014: 580-587. [11] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2020-06-12].https://arxiv.org/pdf/1409.1556v6.pdf. [12] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [13] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [14] MA J Q, SHAO W Y, YE H, et al. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. [15] BELL S, ZITNICK C L, BALA K, et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2874-2883. [16] CUI L S, RUI M A, PEI L V, et al. MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects[C/OL].[2020-06-12]. https://arxiv.org/pdf/1805.07009v3.pdf. [17] YU F, KOLTUN V. Multi-Scale Context Aggregation by Dilated Convolutions[C/OL]. [2020-06-12].https://arxiv.org/pdf/1511.07122.pdf. [18] ZHU R, ZHANG S F, WANG X B, et al. ScratchDet: Training Single-Shot Object Detectors from Scratch // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2263-2272. [19] IOFFE S, SZEGEDY C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift // Proc of the 32nd International Conference on Machine Learning. Washington, USA: IEEE, 2015: 448-456. [20] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [21] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional Single Shot Detector[C/OL].[2020-06-12]. https://arxiv.org/pdf/1701.06659.pdf.