Abstract:Aiming at the problems of high cost of manually designing neural network structure, large amount of calculation of the classification and regression task based on the anchor boxes, and weak detection ability for small targets, a real-time road element detection model based on keypoint estimation is proposed. NAS-based EfficientNet-B3 is employed as the feature extraction network. An improved bi-directional feature pyramid network(BiFPN) method is exploited as the feature fusion network. Instead of anchor boxes, keypoint estimation is utilized for classification and regression tasks. The experiment on BDD100K dataset shows that the proposed model achieves a good precision in real-time detection and a high precision for small objects.
[1] TAN M X, PANG R M, LE Q V. EfficientDet: Scalable and Efficient Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10781-10790. [2] GIRSHICK R. Fast R-CNN // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1440-1448. [3] REN S Q, HE K M, GIRSHJCK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [4] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397. [5] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot Multibox Detector // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37. [6] ZHANG S F, QIAO S Y, YIE C, et al. Single-Shot Refinement Neural Network for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4203-4212. [7] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition [C/OL]. [2020-05-30]. https://arxiv.org/pdf/1409.1556.pdf. [8] DAI J F, LI Y, HE K M, et al. R-FCN: Object Detection via Region-Based Fully Convolutional Networks // Proc of the 29th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2016: 379-387. [9] CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into High Quality Object Detection[C/OL]. [2020-05-30].https://arxiv.org/pdf/1712.00726.pdf. [10] FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional Single Shot Detector[C/OL].[2020-05-30]. https://arxiv.org/pdf/1701.06659.pdf. [11] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [12] TAN M X, LE Q V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks // Proc of the 36th International Conference on Machine Learning. New York, USA: ACM, 2019: 6015-6114. [13] REDMON J, FARHADI A. Yolov3: An Incremental Improvement[C/OL]. [2020-05-30].https://arxiv.org/pdf/1804.02767.pdf. [14] RADOSAVOVIC I, KOSARAJU R P, GIRSHICK R, et al. Designing Network Design Spaces // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Wa-shington, USA: IEEE, 2020: 10425-10433. [15] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [16] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Lear-ning Applied to Document Recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [17] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [18] YU F, WANG D Q, SHELHAMER E, et al. Deep Layer Aggregation // Proc of the IEEE/CVF International Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2403-2412. [19] LIU S, QI L, QIN H F, et al. Path Aggregation Network for Instance Segmentation // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8759-8768. [20] GHIASI G, LIN T Y, LE Q V, et al. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7036-7045. [21] TIAN Z, SHEN C, CHEN H, et al. FCOS: Fully Convolutional One-Stage Object Detection[C/OL].[2020-05-30]. https://arxiv.org/pdf/1904.01355.pdf. [22] LAW H, DENG J. CornerNet: Detecting Objects as Paired Keypoints. International Journal of Computer Vision. 2020, 128: 642-656. [23] ZHOU X Y, ZHUO J C, KRAHENBUHL P. Bottom-up Object Detection by Grouping Extreme and Center Points // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 850-859. [24] ZHOU X Y, WANG D Q, KRAHENBUHL P. Objects as Points[C/OL]. [2020-05-30]. https://arxiv.org/pdf/1904.07850.pdf. [25] DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint Triplets for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6569-65787. [26] YU F, XIAN W Q, CHEN Y Y, et al. BDD100k: A Diverse Dri-ving Video Database with Scalable Annotation Tooling // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 2633-2642. [27] TAN M X, CHEN B, PANG R M, et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2820-2828. [28] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. [29] KINGMA D P, BA J. ADAM: A Method for Stochastic Optimization[C/OL]. [2020-05-30]. https://arxiv.org/pdf/1412.6980.pdf. [30] ELFWING S, UCHIBE E, DOYA K. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. Neural Networks, 2018, 107: 3-11. [31] CHOI J, CHUN D, KIM H, et al. Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 502-511. [32] CAI Z W, FAN Q F, FERIS R S, et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 354-370. [33] HU X W, XU X M, XIAO Y J, et al. SINET: A Scale-Insensitive Convolutional Neural Network for Fast Vehicle Detection. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(3): 1010-1019. [34] ZHAO Q J, WANG Y T, SHENG T,et al. Comprehensive Feature Enhancement Module for Single-Shot Object Detector // Proc of the Asian Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 325-340. [35] LIU S T, HUANG D, WANG Y H. Receptive Field Block Net for Accurate and Fast Object Detection // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 404-419. [36] ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning Deep Features for Discriminative Localization // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2921-2929.