Abstract:RGB-Thermal(RGB-T) object tracking aims to achieve robust object tracking by utilizing the complementarity of RGB information and thermal infrared data. Currently, there are many cutting-edge achievements in RGB-T object tracking based on deep learning, but there is a lack of systematic and comprehensive review literature. In this paper, the challenges faced by RGB-T object tracking are elaborated, and the current mainstream RGB-T object tracking algorithms based on deep learning are analyzed and summarized. Specifically, the existing RGB-T trackers are divided into object tracking methods based on multi-domain network(MDNet), object tracking methods based on Siamese network and object tracking methods based on discriminative correlation filter(DCF) according to their different baselines. Then, the commonly used datasets and evaluation metrics in RGB-T object tracking tasks are introduced and the existing algorithms are compared on the commonly used datasets. Finally, the possible future development directions are pointed out.
张天路, 张强. 基于深度学习的RGB-T目标跟踪技术综述[J]. 模式识别与人工智能, 2023, 36(4): 327-353.
ZHANG Tianlu, ZHANG Qiang. A Survey of RGB-T Object Tracking Technologies Based on Deep Learning. Pattern Recognition and Artificial Intelligence, 2023, 36(4): 327-353.
[1] 卢湖川,李佩霞,王栋.目标跟踪算法综述.模式识别与人工智能, 2018, 31(1): 61-76. (LU H C, LI P X, WANG D.Visual Object Tracking: A Survey. Pattern Recognition and Artificial Intelligence, 2018, 31(1): 61-76.) [2] LI C L, LIANG X Y, LU Y J, et al. RGB-T Object Tracking: Benchmark and Baseline. Pattern Recognition, 2019, 96. DOI: 10.1016/j.patcog.2019.106977. [3] LIU H P, SUN F C.Fusion Tracking in Color and Infrared Images Using Joint Sparse Representation. Science China Information Sciences, 2012, 55(3): 590-599. [4] WU Y, BLASCH E, CHEN G S, et al. Multiple Source Data Fusion via Sparse Representation for Robust Visual Tracking // Proc of the 14th International Conference on Information Fusion. Washington, USA: IEEE, 2011: 1-8. [5] ZHAI S L, SHAO P P, LIANG X Y, et al. Fast RGB-T Tracking via Cross-Modal Correlation Filters. Neurocomputing, 2019, 334: 172-181. [6] LI C L, ZHAO N, LU Y J, et al. Weighted Sparse Representation Regularized Graph Learning for RGB-T Object Tracking // Proc of the 25th ACM International Conference on Multimedia. New York, USA: ACM, 2017: 1856-1864. [7] LI C L, ZHU C L, ZHENG S F, et al. Two-Stage Modality-Graphs Regularized Manifold Ranking for RGB-T Tracking. Signal Proce-ssing(Image Communication), 2018, 68: 207-217. [8] LI C L, WU X H, ZHAO N, et al. Fusing Two-Stream Convolutio-nal Neural Networks for RGB-T Object Tracking. Neurocomputing, 2018, 281: 78-85. [9] ZHANG X M, ZHANG X H, DU X D, et al. Learning Multi-domain Convolutional Network for RGB-T Visual Tracking // Proc of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics. Washington, USA: IEEE, 2018. DOI: 10.1109/CISP-BMEI.2018.8633180. [10] ZHU Y B, LI C L, LUO B, et al. Dense Feature Aggregation and Pruning for RGBT Tracking // Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 465-472. [11] ZHU Y B, LI C L, LUO B, et al. FANet: Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking[C/OL].[2022-10-25]. https://arxiv.org/pdf/1811.09855.pdf. [12] ZHU Y B, LI C L, TANG J, et al. Quality-Aware Feature Aggregation Network for Robust RGBT Tracking. IEEE Transactions on Intelligent Vehicles, 2021, 6(1): 121-130. [13] GAO Y, LI C L, ZHU Y B, et al. Deep Adaptive Fusion Network for High Performance RGBT Tracking // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2019: 91-99. [14] ZHANG L C, DANELLJAN M, GONZALEZ-GARCIA A, et al. Multi-modal Fusion for End-To-End RGB-T Tracking // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2019: 2252-2261. [15] 丁正彤,徐磊,张研,等.RGB-T 目标跟踪综述.南京信息工程大学学报(自然科学版), 2019, 11(6): 690-697. (DING Z T, XU L, ZHANG Y.A Survey of RGB-T Object Tra-cking. Journal of Nanjing University of Information Science and Technology(Natural Science Edition), 2019, 11(6): 690-697.) [16] ZHANG X C, YE P, LEUNG H, et al. Object Fusion Tracking Based on Visible and Infrared Images: A Comprehensive Review. Information Fusion, 2020, 63: 166-187. [17] 杜晨杰,杨宇翔,伍瀚,等.旋转自适应的多特征融合多模板学习视觉跟踪算法.模式识别与人工智能, 2021, 34(9): 787-797. (DU C J, YANG Y X, WU H, et al. Visual Tracking Algorithm Based on Rotation Adaptation, Multi-feature Fusion and Multi-template Learning. Pattern Recognition and Artificial Intelligence, 2021, 34(9): 787-797.) [18] 储珺,危振,缪君,等.基于遮挡检测和多块位置信息融合的分块目标跟踪算法.模式识别与人工智能, 2020, 33(1): 59-65. (CHU J, WEI Z, MIAO J, et al. Block Target Tracking Based on Occlusion Detection and Multi-block Position Information Fusion. Pattern Recognition and Artificial Intelligence, 2020, 33(1): 59-65.) [19] NAM H, HAN B.Learning Multi-domain Convolutional Neural Networks for Visual Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016, 4293-4302. [20] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [21] JUNG I, SON J, BAEK M, et al. Real-Time MDNet // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 89-104. [22] 姚云翔,陈莹.注意力机制下双模态交互融合的目标跟踪网络.系统工程与电子技术, 2022, 44(2): 410-419. (YAO Y X, CHEN Y.Target Tracking Network Based on Dual Modal Interactive Fusion under Attention Mechanism. Systems Engineering and Electronics, 2022, 44(2): 410-419.) [23] ZHANG H, ZHANG L, ZHUO L, et al. Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning. Sensors, 2020, 20(2). DOI: 10.3390/s20020393. [24] LU A D, QIAN C, LI C L, et al. Duality-Gated Mutual Condition Network for RGBT Tracking. IEEE Transactions on Neural Networks and Learning Systems, 2022. DOI: 10.1109/TNNLS.2022.3157594. [25] EFRAIMIDIS P S, SPIRAKIS P G.Weighted Random Sampling with a Reservoir. Information Processing Letters, 2006, 97(5): 181-185. [26] MEI J T, ZHOU D M, CAO J D, et al. HDINet: Hierarchical Dual-Sensor Interaction Network for RGBT Tracking. IEEE Sensors Journal, 2021, 21(15): 16915-16926. [27] WANG X, SHU X J, ZHANG S L, et al. MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking. IEEE Transactions on Multimedia, 2022. DOI: 10.1109/TMM.2022.3174341. [28] LI C L, LU A D, ZHENG A H, et al. Multi-adapter RGBT Tra-cking // Proc of the IEEE/CVF International Conference on Computer Vision Workshop. Washington, USA: IEEE, 2019: 2262-2270. [29] MEI J T, LIU Y Y, WANG C C, et al. Asymmetric Global-Local Mutual Integration Network for RGBT Tracking. IEEE Transactions on Instrumentation and Measurement, 2022, 71. DOI: 10.1109/TIM.2022.3193971. [30] MEI J T, ZHOU D M, CAO J D, et al. Differential Reinforcement and Global Collaboration Network for RGBT Tracking. IEEE Sensors Journal, 2023, 23(7): 7301-7311. [31] LU A D, LI C L, YAN Y Q, et al. RGBT Tracking via Multi-adapter Network with Hierarchical Divergence Loss. IEEE Transactions on Image Processing, 2021, 30: 5613-5625. [32] XU Q, MEI Y M, LIU J P, et al. Multimodal Cross-Layer Bilinear Pooling for RGBT Tracking. IEEE Transactions on Multimedia, 2021, 24: 567-580. [33] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2022-10-25]. https://arxiv.org/pdf/1409.1556.pdf. [34] WOO S, PARK J, LEE J, et al. CBAM: Convolutional Block Attention Module // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19. [35] LI Y D, LAI H C, WANG L J, et al. Multibranch Adaptive Fusion Network for RGBT Tracking. IEEE Sensors Journal, 2022, 22(7): 7084-7093. [36] XIA W D, ZHOU D M, CAO J D, et al. CIRNet: An Improved RGBT Tracking via Cross-Modality Interaction and Re-identification. Neurocomputing, 2022, 493: 327-339. [37] ZHU Y B, LI C L, TANG J, et al. RGBT Tracking by Trident Fusion Network. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(2): 579-592. [38] QI Y K, ZHANG S P, ZHANG W G, et al. Learning Attribute-Specific Representations for Visual Tracking // Proc of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI, 2009: 8835-8842. [39] LI C L, LIU L, LU A D, et al. Challenge-Aware RGBT Tracking // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 222-237. [40] ZHANG P Y, WANG D, LU H C, et al. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. International Journal of Computer Vision, 2021, 129(9): 2714-2729. [41] XIAO Y, YANG M M, LI C L, et al. Attribute-Based Progressive Fusion Network for RGBT Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 2831-2838. [42] WANG C Q, XU C Y, CUI Z, et al. Cross-Modal Pattern-Propagation for RGB-T Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 7062-7071. [43] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local Neural Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7794-7803. [44] TU Z Z, LIN C, ZHAO W, et al. M5I: Multi-modal Multi-margin Metric Learning for RGBT Tracking. IEEE Transactions on Image Processing, 2021, 31: 85-98. [45] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning Discriminative Model Prediction for Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6181-6190. [46] MIRZA M, OSINDERO S. Conditional Generative Adversarial Nets[C/OL]. [2022-10-25]. https://arxiv.org/pdf/1411.1784.pdf. [47] HUANG L H, ZHAO X, HUANG K Q.Got-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. [48] ZHAO L, ZHU M, REN H G, et al. Channel Exchanging for RGB-T Tracking. Sensors. 2021, 21(17). DOI: 10.3390/s21175800. [49] IOFFE S, SZEGEDY C.Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift // Proc of the 32nd International Conference on Machine Learning. San Diego, USA: JMLR, 2015: 448-456. [50] FAN H, LIN L T, YANG F, et al. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5369-5378. [51] ZHANG P Y, ZHAO J, WANG D, et al. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8876-8885. [52] DAI K N, ZHANG Y H, WANG D, et al. High-Performance Long-Term Tracking with Meta-Updater // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 6297-6306. [53] ZHANG Q, LIU X R, ZHANG T L.RGB-T Tracking by Modality Difference Reduction and Feature Re-selection. Image and Vision Computing, 2022, 127. DOI: 10.1016/j.imavis.2022.104547. [54] ZHANG P Y, ZHAO J, BO C J, et al. Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking. IEEE Transactions on Image Processing, 2021, 30: 3335-3347. [55] DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient Convolution Operators for Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6931-6939. [56] TALMI I, MECHREZ R, ZELNIK-MANOR L.Template Matching with Deformable Diversity Similarity // Proc of the IEEE Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 1311-1319. [57] REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 779-788. [58] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-Convolutional Siamese Networks for Object Tracking // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 850-865. [59] ZHANG X C, YE P, PENG S Y, et al. DSiamMFT: An RGB-T Fusion Tracking Method via Dynamic Siamese Networks Using Multi-layer Feature Fusion. Signal Processing: Image Communication, 2020, 84. DOI: 10.1016/j.image.2019.115756. [60] GUO Q, FENG W, ZHOU C, et al. Learning Dynamic Siamese Network for Visual Object Tracking // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 1781-1789. [61] ZHANG T L, LIU X R, ZHANG Q, et al. SiamCDA: Complementarity-and Distractor-Aware RGB-T Tracking Based on Siamese Network. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1403-1417. [62] 申亚丽. 基于特征融合的RGBT双模态孪生跟踪网络.红外与激光工程, 2021, 50(3): 236-242. (SHEN Y L.RGBT Dual-Modal Siamese Tracking Network with Feature Fusion. Infrared and Laser Engineering, 2021, 50(3): 236-242.) [63] ZHANG X C, YE P, PENG S Y, et al. SiamFT: An RGB-Infrared Fusion Tracking Method via Fully Convolutional Siamese Networks. IEEE Access, 2019, 7: 122122-122133. [64] KANG B, LIANG D, MEI J X, et al. Robust RGB-T Tracking via Graph Attention-Based Bilinear Pooling. IEEE Transactions on Neural Networks and Learning Systems, 2022. DOI: 10.1109/TNNLS.2022.3161969. [65] FENG M Z, SU J B.Learning Reliable Modal Weight with Transformer for Robust RGBT Tracking. Knowledge-Based Systems, 2022, 249. DOI: 10.1016/j.knosys.2022.108945. [66] CHEN X, YAN B, ZHU J W, et al. Transformer Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2021: 8122-8131. [67] YANG J Y, LI Z, ZHENG F, et al. Prompting for Multi-modal Tracking // Proc of the 30th ACM International Conference on Multimedia. New York, USA: ACM, 2022: 3492-3500. [68] LI C L, CHENG H, HU S Y, et al. Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking. IEEE Transactions on Image Processing, 2016, 25(12): 5743-5756. [69] LI C L, XUE W L, JIA Y Q, et al. LasHeR: A Large-Scale High-Diversity Benchmark for RGBT Tracking. IEEE Transactions on Image Processing, 2021, 31: 392-404. [70] KRISTAN M, MATAS J, LEONARDIS A, et al. The Seventh Vi-sual Object Tracking VOT2019 Challenge Results // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2019: 2206-2241. [71] WANG N, ZHOU W G, WANG J, et al. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 1571-1580. [72] GOU J P, YU B S, MAYBANK S J, et al. Knowledge Distillation: A Survey. International Journal of Computer Vision, 2021, 129(6): 1789-1819. [73] YAN B, PENG H W, WU K, et al. LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 15175-15184. [74] LUKEŽIČ A, MATAS J, KRISTAN M. D3s-A Discriminative Single Shot Segmentation Tracker // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 7131-7140. [75] XU N, YANG L J, FAN Y C, et al. YouTube-VOS: Sequence-to-Sequence Video Object Segmentation // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 603-619.