1. School of Computer Science and Engineering, Shenyang Jianzhu University, Shenyang 110168; 2. Liaoning Province Big Data Management and Analysis Laboratory of Urban Construction, Shenyang Jianzhu University, Shenyang 110168; 3. Shenyang Branch of National Special Computer Engineering Technology Research Center, Shenyang Jianzhu University, Shenyang 110168; 4. School of Electrical and Control Engineering, Shenyang Jianzhu University, Shenyang 110168
Abstract:The high computational complexity of current Siamese network based target tracking algorithm during the candidate box generation stage results in poor real-time performance and reduced accuracy in complex scenarios. To address these issues, an anchor-free RepPoints and attention mechanism based adaptive Siamese network for object tracking is proposed. First, a large-kernel convolutional attention module is introduced in the backbone network of the Siamese subnetwork to extract global features of the target, enhancing the precision and generalization ability of the model. Second, a module for anchor-free multi-RepPoints is utilized to learn multiple RepPoints of the target, and then an adaptive learning weight coefficient module is employed to filter out more accurate target RepPoints, further improving model precision and robustness. Finally, RepPoints are transformed into predicted boxes, thereby eliminating the need for predefined candidate boxes, reducing computational complexity and enhancing real-time tracking performance. Experiments indicate that the proposed method achieves significant improvements in precision and success rate on four datasets.
[1] 夏楠,邱天爽,李景春,等.一种卡尔曼滤波与粒子滤波相结合的非线性滤波算法.电子学报, 2013, 41(1): 148-152. (XIA N, QIU T S, LI J C, et al. A Nonlinear Filtering Algorithm Combining the Kalman Filter and Particle Filter. Acta Electronica Sinica, 2013, 41(1): 148-152.) [2] 王法胜,鲁明羽,赵清杰,等.粒子滤波算法.计算机学报, 2014, 37(8): 1679-1694. (WANG F S, LU M Y, ZHAO Q J, et al. Particle Filtering Algorithm. Chinese Journal of Computers, 2014, 37(8): 1679-1694.) [3] COLLINS R T. Mean-Shift Blob Tracking through Scale Space // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2003. DOI: 10.1109/CVPR.2003.1211475. [4] BAKER S, MATTHEWS I. Lucas-Kanade 20 Years on: A Unifying Framework. International Journal of Computer Vision, 2004, 56(3): 221-255. [5] TRAN D, BOURDEV L, FERGUS R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 4489-4497. [6] SONG Y B, MA C, WU X H, et al. VITAL: Visual Tracking via Adversarial Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8990-8999. [7] 李雪,王晓艳,王鹏,等.结合双注意力与特征融合的孪生网络目标追踪.北京邮电大学学报, 2022, 45(4): 116-122. (LI X, WANG X Y, WANG P, et al. Siamese Network Target Tracking That Combines Dual Attention and Feature Fusion. Journal of Beijing University of Posts and Telecommunications, 2022, 45(4): 116-122.) [8] BEWLEY A, GE Z Y, OTT L, et al. Simple Online and Realtime Tracking // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2016: 3464-3468. [9] WOJKE N, BEWLEY A, PAULUS D. Simple Online and Realtime Tracking with a Deep Association Metric // Proc of the IEEE International Conference on Image Processing. Washington, USA: IEEE, 2017: 3645-3649. [10] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate Tracking by Overlap Maximization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4655-4664. [11] ZHANG Y F, WANG C Y, WANG X G, et al. FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. International Journal of Computer Vision, 2021, 129(11): 3069-3087. [12] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-Convolutional Siamese Networks for Object Tracking // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 850-865. [13] LI B, YAN J J, WU W, et al. High Performance Visual Tracking with Siamese Region Proposal Network // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 8971-8980. [14] ZHU Z, WANG Q, LI B, et al. Distractor-Aware Siamese Networks for Visual Object Tracking // Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2018: 103-119. [15] ZHANG Z P, PENG H W. Deeper and Wider Siamese Networks for Real-Time Visual Tracking // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4586-4595. [16] HU W M, WANG Q, ZHANG L, et al. SiamMask: A Framework for Fast Online Object Tracking and Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089. [17] LI B, WU W, WANG Q, et al. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4277-4288. [18] YANG Z, LIU S H, HU H, et al. RepPoints: Point Set Representation for Object Detection // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 9656-9665. [19] CHEN Y H, ZHANG Z, CAO Y, et al. RepPoints v2: Verification Meets Regression for Object Detection // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 5621-5631 [20] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2024-07-17]. https://arxiv.org/pdf/2010.11929v2. [21] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010. [22] GUO M H, LU C Z, LIU Z N, et al. Visual Attention Network. Computational Visual Media, 2023, 9(4): 733-752. [23] CHOLLET F. Xception: Deep Learning with Depthwise Separable Convolutions // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 1800-1807. [24] YU F, KOLTUN V. Multi-scale Context Aggregation by Dilated Convolutions[C/OL]. [2024-07-17].https://arxiv.org/pdf/1511.07122. [25] HUA B S, TRAN M K, YEUNG S K. Pointwise Convolutional Neural Networks // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 984-993. [26] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [27] ZHANG S F, CHI C, YAO Y Q, et al. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection // Proc of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9756-9765. [28] NEUBECK A, VAN GOOL L. Efficient Non-maximum Suppression // Proc of the 18th International Conference on Pattern Recognition. Washington, USA: IEEE, 2006. DOI: 10.1109/ICPR.2006.479. [29] DAI J F, QI H Z, XIONG Y W, et al. Deformable Convolutional Networks // Proc of the IEEE International Conference on Compu-ter Vision. Washington, USA: IEEE, 2017: 764-773. [30] WANG R X, SHIVANNA R, CHENG D, et al. DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-Scale Learning to Rank Systems // Proc of the Web Conference. New York, USA: ACM, 2021: 1785-1797. [31] LIU C, YU S, YU M, et al. Adaptive Smooth L1 Loss: A Better Way to Regress Scene Texts with Extreme Aspect Ratios // Proc of the IEEE Symposium on Computers and Communications. Washing-ton, USA: IEEE, 2021. DOI: 10.1109/ISCC53001.2021.9631466. [32] KRISTAN M, MATAS J, LEONARDIS A, et al. The Seventh Vi-sual Object Tracking VOT2019 Challenge Results // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2019: 2206-2241. [33] YANG K Z, HU T, DAI K X, et al. CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 6086-6096. [34] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning Discriminative Model Prediction for Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6181-6190. [35] DANELLJAN M, VAN GOOL L, TIMOFTE R. Probabilistic Regression for Visual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 7181-7190. [36] CHEN Z D, ZHONG B N, LI G R, et al. Siamese Box Adaptive Network for Visual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 6667-6676. [37] ZHU J W, LAI S M, CHEN X, et al. Visual Prompt Multi-modal Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 9516-9526. [38] 李俊,曹林,张帆,等.分布统计特征的孪生网络目标跟踪方法.计算机工程与应用, 2024, 60(8): 213-224. (LI J, CAO L, ZHANG F, et al. Siamese Networks for Object Tracking on Statistical Characteristics of Distributions. Computer Engineering and Applications, 2024, 60(8): 213-224.) [39] 程旭,崔一平,张年杰,等. 基于元循环优化的多尺度感知孪生网络目标跟踪方法.小型微型计算机系统, 2024, 45(5): 1099-1108. (CHENG X, CUI Y P, ZHANG N J, et al. Meta-Cyclic Optimization Based Multi-scale Perceptual Siamese Networks Method for Object Tracking. Journal of Chinese Computer Systems, 2024, 45(5): 1099-1108.) [40] ZHANG J, MA S G, SCLAROFF S. MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization // Proc of the 13th European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 188-203. [41] HARE S, GOLODETZ S, SAFFARI A, et al. Struck: Structured Output Tracking with Kernels. IEEE Transactions on Pattern Ana-lysis and Machine Intelligence, 2016, 38(10): 2096-2109. [42] ZHAO F, HUI K D, WANG T T, et al. A KCF-Based Incremental Target Tracking Method with Constant Update Speed. IEEE Access, 2021, 9: 73544-73560. [43] DONG Y, DESOUZA G N. Adaptive Learning of Multi-subspace for Foreground Detection under Illumination Changes. Computer Vision and Image Understanding, 2011, 115(1): 31-49. [44] DANELLIAN M, BHAT G, KHAN F S, et al. ECO: Efficient Convolution Operators for Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 8931-8939. [45] ABBASS M Y, KWON K, KIM N, et al. Efficient Object Tracking Using Hierarchical Convolutional Features Model and Correlation Filters. The Visual Computer, 2021, 37: 831-842. [46] JIA X, LU H C, YANG M H. Visual Tracking via Adaptive Structural Local Sparse Appearance Model // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 1822-1829.