Abstract:To address the challenges of uneven density distribution and large scale variations in complex crowd scenes, existing Transformer-based methods typically overlook the utilization of spatial and channel information while handling cross-scale contextual features. Therefore, a method for crowd counting based on regional context awareness(RCA) is proposed. First, a region guidance module is designed to adaptively assign an attention region for each feature location. Thereby region-level context is introduced and non-uniform density distributions are better accommodated. Second, a spatial-channel context awareness module is designed to enable feature interaction across spatial and channel dimensions. Consequently, cross-dimensional regional dependencies are constructed and the discrimination between foreground and background regions is enhanced. Finally, a distribution-level constraint is introduced during the training to improve the consistency between the predicted density distribution and the ground-truth distribution. Experimental results on JHU-Crowd++, ShanghaiTech A, and ShanghaiTech B datasets validate the robustness and generalization capability of RCA in complex scenes.
[1] BHATTARAI U, KARKEE M.A Weakly-Supervised Approach for Flower/Fruit Counting in Apple Orchards. Computers in Industry, 2022, 138. DOI: 10.1016/j.compind.2022.103635. [2] TARLING P, CANTOR M, CLAPÉS A, et al. Deep Learning with Self-Supervision and Uncertainty Regularization to Count Fish in Underwater Images. PLoS One, 2022, 17(5). DOI: 10.1371/journal.pone.0267759. [3] FARJON G, LIU H J, EDAN Y.Deep-Learning-Based Counting Methods, Datasets, and Applications in Agriculture: A Review. Precision Agriculture, 2023, 24(5): 1683-1711. [4] STEWART R, ANDRILUKA M, NG A Y.End-to-End People Detection in Crowded Scenes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 2325-2333. [5] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source Multi-scale Counting in Extremely Dense Crowd Images // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 2547-2554. [6] 王元喆,梁腾飞,曾宇乔,等.多光谱目标检测综述.信息与控制, 2024, 53(3): 287-301. (WANG Y Z, LIANG T F, ZENG Y Q, et al. Overview of Multispectral Object Detection. Information and Control, 2024, 53(3): 287-301.) [7] LI Y H, ZHANG X F, CHEN D M.CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 1091-1100. [8] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-Image Crowd Counting via Multi-column Convolutional Neural Network // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 589-597. [9] ZHANG L, SHI M J, CHEN Q B.Crowd Counting via Scale-Adaptive Convolutional Neural Network // Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2018: 1113-1121. [10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[C/OL].[2025-10-18]. https://arxiv.org/pdf/1706.03762. [11] 冯素坤,王志全.公共安全治理中可疑目标变形Transformer跟踪器.信息与控制, 2024, 53(1): 98-107. (FENG S K, WANG Z Q.Suspicious Target Transformer Tracker in Public Security Governance. Information and Control, 2024, 53(1): 98-107.) [12] WANG Y J, WANG F, HUANG D Y.Dual-Branch Counting Method for Dense Crowd Based on Self-Attention Mechanism. Expert Systems with Applications, 2024, 236. DOI: 10.1016/j.eswa.2023.121272. [13] CHEN Y H, YANG J, CHEN B D, et al. Counting Varying Density Crowds through Density Guided Adaptive Selection CNN and Transformer Estimation. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(3): 1055-1068. [14] TIAN Y, CHU X X, WANG H P.CCTrans: Simplifying and Improving Crowd Counting with Transformer[C/OL]. [2025-10-18].https://arxiv.org/pdf/2109.14483. [15] GAO J Y, WANG Q, YUAN Y.SCAR: Spatial-/Channel-Wise Attention Regression Networks for Crowd Counting. Neurocompu-ting, 2019, 363. DOI: 10.1016/j.neucom.2019.08.018. [16] ZHU J W, ZHAO W D, YAO L B, et al. Confusion Region Mi-ning for Crowd Counting. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 18039-18051. [17] HAN R, QI R, LU X Q, et al. Counting in Congested Crowd Scenes with Hierarchical Scale-Aware Encoder-Decoder Network. Expert Systems with Applications, 2024, 238. DOI: 10.1016/j.eswa.2023.122087. [18] WAN J, WANG Q Z, CHAN A B.Kernel-Based Density Map Generation for Dense Object Counting. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2022, 44(3): 1357-1370. [19] XU C F, LIANG D K, XU Y C, et al. AutoScale: Learning to Scale for Crowd Counting. International Journal of Computer Vision, 2022, 130(2): 405-434. [20] ZAND M, DAMIRCHI H, FARLEY A, et al. Multiscale Crowd Counting and Localization by Multitask Point Supervision // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2022: 1820-1824. [21] XIONG L Y, ZENG Y J, HUANG X H, et al. MLANet: Multi-level Attention Network with Multi-scale Feature Fusion for Crowd Counting. Cluster Computing, 2024, 27(5): 6591-6608. [22] XIONG L Y, LI Z D, HUANG X H, et al. CSFNet: A Novel Counting Network Based on Context Features and Multi-scale Information. Multimedia Systems, 2025, 31(1). DOI: 10.1007/s00530-024-01603-6. [23] WANG B Y, LIU H D, SAMARAS D, et al.Distribution Ma-tching for Crowd Counting // Proc of the 34th International Confe-rence on Neural Information Processing Systems. Cambridge, USA:MIT Press, 2020: 1595-1607. [24] SONG Q Y, WANG C G, JIANG Z K, et al. Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 3345-3354. [25] LI C, HU X L, ABOUSAMRA S, et al. Calibrating Uncertainty for Semi-supervised Crowd Counting // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 16685-16695. [26] 李兆鑫,卢树华,兰凌强,等.Involution改进的卷积神经网络人群计数方法.激光与光电子学进展, 2022, 59(18): 261-268. (LI Z X, LU S H, LAN L Q, et al. Convolutional Neural Network Method for Crowd Counting Improved Using Involution Operator. Laser & Optoelectronics Progress, 2022, 59(18): 261-268.) [27] WANG S Z, LU Y, ZHOU T F, et al. SCLNet: Spatial Context Learning Network for Congested Crowd Counting. Neurocomputing, 2020, 404: 227-239. [28] SINDAGI V A, YASARLA R, PATEL V M.JHU-CROWD++: Large-Scale Crowd Counting Dataset and a Benchmark Method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2594-2609. [29] CHEN J Y, GAO M L, LI Q L, et al. Privacy-Aware Crowd Counting by Decentralized Learning with Parallel Transformers. Internet of Things, 2024, 26. DOI: 10.1016/j.iot.2024.101167. [30] MENG Y D, ZHANG H R, ZHAO Y T, et al. Spatial Uncertainty-Aware Semi-supervised Crowd Counting // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 15529-15539. [31] WAN J, LIU Z Q, CHAN A B.A Generalized Loss Function for Crowd Counting and Localization // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 1974-1983. [32] LIANG D K, CHEN X W, XU W, et al. TransCrowd: Weakly-Supervised Crowd Counting with Transformers. Science China(Information Sciences), 2022, 65(6). DOI: 10.1007/s11432-021-3445-y. [33] WANG M J, CAI H, DAI Y, et al. Dynamic Mixture of Counter Network for Location-Agnostic Crowd Counting // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 167-177.