Abstract:With the continuous development of computer vision and artificial intelligence, crowd counting algorithms based on intelligent video analysis have made considerable headway. However, the counting accuracy and robustness are far from satisfactory. Aiming at the problem of multi-scale feature and background interference in crowd counting task, an anti-background interference crowd counting network based on multi-scale feature fusion(AntiNet-MFF) is proposed. Based on the U-Net network architecture, a hierarchical feature split block is integrated into the AntiNet-MFF model, and multi-scale features of the crowd are also extracted with the help of the powerful representation capability of deep learning. To increase the attention of the counting model to the crowd area and reduce the interference of background noise, a background segmentation attention map(B-Seg Attention Map) is generated in the decoding stage. Then, B-Seg attention map is taken as the attention to guide counting model in focusing on the head area to improve the quality of the crowd distribution density map. Experiments on several typical crowd counting datasets show that AntiNet-MFF achieves promising results in terms of accuracy and robustness compared with the existing algorithms.
[1] 余鹰,朱慧琳,钱进,等.基于深度学习的人群计数研究综述.计算机研究与发展, 2021, 58(12): 2724-2747. (YU Y, ZHU H L, QIAN J, et al. Survey on Deep Learning Based Crowd Counting. Journal of Computer Research and Development, 2021, 58(12): 2724-2747.) [2] LIU W Z, SALZMANN M, FUA P.Estimating People Flows to Be-tter Count Them in Crowded Scenes// Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2020: 723-740. [3] FANG Y Y, GAO S H, LI J, et al. Multi-level Feature Fusion Based Locality-Constrained Spatial Transformer Network for Video Crowd Counting. Neurocomputing, 2020, 392: 98-107. [4] BAI S, HE Z Q, QIAO Y, et al. Adaptive Dilated Network with Self-Correction Supervision for Counting// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 4593-4602. [5] ZHANG K B, WANG H K, LIU W, et al. An Efficient Semi-Supervised Manifold Embedding for Crowd Counting. Applied Soft Computing, 2020, 96. DOI: 10.1016/j.asoc.2020.106634. [6] SHI Z L, ZHANG L, SUN Y B, et al. Multiscale Multitask Deep NetVLAD for Crowd Counting. IEEE Transactions on Industrial Informatics, 2018, 14(11): 4953-4962. [7] XIONG F, SHI X J, YEUNG D Y.Spatiotemporal Modeling for Crowd Counting in Videos// Proc of the IEEE International Confe-rence on Computer Vision. Washington, USA: IEEE, 2017: 5161-5169. [8] HUANG S Y, LI X, ZHANG Z F, et al. Body Structure Aware Deep Crowd Counting. IEEE Transactions on Image Processing, 2018, 27(3): 1049-1059. [9] MARSDEN M, MCGUINNESS K, LITTLE S, et al. People, Penguins and Petri Dishes: Adapting Object Counting Models to New Visual Domains and Object Types without Forgetting// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8070-8079. [10] YANG Y F, LI G R, WU Z, et al Reverse Perspective Network for Perspective-Aware Object Counting// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 4373-4382. [11] ZHANG C, LI H S, WANG X G, et al. Cross-Scene Crowd Coun-ting via Deep Convolutional Neural Networks// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 833-841. [12] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-Image Crowd Counting via Multi-column Convolutional Neural Network// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 589-597. [13] SAM D B, SURYA S, BABU R V.Switching Convolutional Neural Network for Crowd Counting// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4031-4039. [14] SINDAGI V A, PATEL V M.Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1879-1888. [15] LI Y H, ZHANG X F, CHEN D M.CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes// Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2018: 1091-1100. [16] YU F, KOLTUN V.Multi-scale Context Aggregation by Dilated Convolutions[C/OL]. [2022-06-25].https://arxiv.org/pdf/1511.07122v1.pdf. [17] CAO X K, WANG Z P, ZHAO Y Y,et al. Scale Aggregation Network for Accurate and Efficient Crowd Counting// Proc of the European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 757-773. [18] HOSSAIN M, HOSSEINZADEH M, CHANDA O, et al. Crowd Coun-ting Using Scale-Aware Attention Networks// Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2019: 1280-1288. [19] RONNEBERGER O, FISCHER P, BROX T.U-Net: Convolutio-nal Networks for Biomedical Image Segmentation// Proc of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2015: 234-241. [20] SINDAGI V, PATEL V.Multi-level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1002-1012. [21] HAN K, WANG Y H, TIAN Q, et al. GhostNet: More Features from Cheap Operations// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 1577-1586. [22] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source Multi-scale Counting in Extremely Dense Crowd Images// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 2547-2554. [23] IDREES H, TAYYAB M, ATHREY K, et al. Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 544-559. [24] ZHU L, ZHAO Z J, LU C, et al. Dual Path Multi-scale Fusion Networks with Attention for Crowd Counting[C/OL].[2022-06-25]. https://arxiv.org/pdf/1902.01115.pdf. [25] LEMPITSKY V, ZISSERMAN A. Learning to Count Objects in Images// Proc of the 23rd International Conference on Neural Information Processing Systems. Cambridge: USA: MIT Press, 2010:1324-1332. [26] HE K M, ZHANG X Y, REN S Q, et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. [27] KANG K, WANG X G. Fully Convolutional Neural Networks for Crowd Segmentation[C/OL]. [2022-06-25]. https://arxiv.org/pdf/1411.4464.pdf. [28] RANJAN V, LE H, HOAI M.Iterative Crowd Counting// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 278-293. [29] SINDAGI V A, PATEL V M.CNN-Based Cascaded Multi-task Lear-ning of High-Level Prior and Density Estimation for Crowd Counting// Proc of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. Washington, USA: IEEE, 2017. DOI: 10.1109/AVSS.2017.8078491.