Unsupervised Cross-Modality Person Re-identification Based on Semantic Pseudo-Label and Dual Feature Memory Banks
SUN Rui1,2, YU Yiheng1,2, ZHANG Lei1,2, ZHANG Xudong1,2
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601; 2. Anhui Key Laboratory of Industry Safety and Emergency Technology, Hefei University of Technology, Hefei 230009
摘要 现有的有监督可见光-近红外行人重识别方法需要大量人力资源去除手工标注数据,容易受到标注数据场景的限制,难以满足真实多变应用场景的泛化性.因此,文中提出基于语义伪标签和双重特征存储库的无监督跨模态行人重识别方法.首先,提出基于对比学习框架的预训练方法,利用可见光行人图像和其生成的辅助灰度图像进行训练.利用该预训练方法获取对颜色变化具有鲁棒性的语义特征提取网络.然后,使用DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类方法生成语义伪标签.相比现有的伪标签生成方法,文中提出的语义伪标签在生成过程中充分利用跨模态数据之间的结构信息,减少跨模态数据颜色变化带来的模态差异.此外,文中还构建实例级困难样本特征存储库和中心级聚类特征存储库,充分利用困难样本特征和聚类特征,让模型对噪声伪标签具有更强的鲁棒性.在SYSU-MM01、RegDB两个跨模态数据集上的实验验证文中方法的有效性.
Abstract:The existing supervised visible infrared person re-identification methods require a lot of human resources to manually label the data and they fail to adapt to the generalization of real and changeable application scenes due to the limitation by the labeled data scene. In this paper, an unsupervised cross-modality person re-identification method based on semantic pseudo-label and dual feature memory banks is proposed. Firstly, a pre-training method based on the contrast learning framework is proposed, using the visible image and its generated auxiliary gray image for training. The pre-training method is employed to obtain the semantic feature extraction network that is robust to color changes. Then,semantic pseudo-label is generated by density based spatial clustering of applications with noise (DBSCAN) clustering method. Compared with the existing pseudo-label generation methods, the proposed method makes full use of the structural information between the cross-modality data in the generation process, and thus the modality discrepancy caused by the color change of the cross-modality data is reduced. In addition, an instance-level hard sample feature memory bank and a centroid-level clustering feature memory bank are constructed to make the model more robust to noise pseudo-label by hard sample features and clustering features. Experimental results obtained on two cross-modality datasets, SYSU-MM01 and RegDB, demonstrate the effectiveness of the proposed method.
[1] ZHENG L, YANG Y, HAUPTMANN A G.Person Re-identification: Past, Present and Future[C/OL]. [2022-03-25].https://arxiv.org/pdf/1610.02984.pdf. [2] ZHU X K, JING X Y, YOU X G, et al. Image to Video Person Re-identification by Learning Heterogeneous Dictionary Pair with Feature Projection Matrix. IEEE Transactions on Information Forensics and Security, 2018, 13(3): 717-732. [3] LIN S, LI H L, LI C T, et al. Multi-task Mid-Level Feature Alignment Network for Unsupervised Cross-Dataset Person Re-identification[C/OL].[2022-03-25]. https://arxiv.org/pdf/1807.01440.pdf. [4] WANG J Y, ZHU X T, GONG S G, et al. Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2275-2284. [5] TARVAINEN A, VALPOLA H. Mean Teachers Are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results// Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 1195-1204. [6] CHEN Y B, ZHU X T, GONG S G.Instance-Guided Context Rendering for Cross-Domain Person Re-identification// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 232-242. [7] WANG D K, ZHANG S L.Unsupervised Person Re-identification via Multi-label Classification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10978-10987. [8] ZHANG X Y, CAO J W, SHEN C H, et al. Self-Training with Progressive Augmentation for Unsupervised Cross-Domain Person Re-identification// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 8221-8230. [9] ZHONG Z, ZHENG L, LI S Z, et al. Generalizing a Person Retrieval Model Hetero- and Homogeneously// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 176-192. [10] LIAO S C, HU Y, ZHU X Y, et al. Person Re-identification by Local Maximal Occurrence Representation and Metric Learning// Proc of the IEEE Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2015: 2197-2206. [11] KODIROV E, XIANG T, GONG S G. Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification[C/OL]. [2022-03-25]. http://www.bmva.org/bmvc/2015/papers/paper044/abstract044.pdf. [12] 贲晛烨,徐森,王科俊.行人步态的特征表达及识别综述.模式识别与人工智能, 2012, 25(1): 71-81. (BEN X Y, XU S, WANG K J.Review on Pedestrian Gait Feature Expression and Recognition. Pattern Recognition and Artificial Intelligence, 2012, 25(1): 71-81.) [13] HAN D, KIM J.Unsupervised Simultaneous Orthogonal Basis Clu-stering Feature Selection// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 5016-5023. [14] YANG Q Z, YU H X, WU A C, et al. Patch-Based Discriminative Feature Learning for Unsupervised Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2019: 3628-3637. [15] LIN Y T, DONG X Y, ZHENG L, et al. A Bottom-Up Clustering Approach to Unsupervised Person Re-identification// Proc of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 8738-8745. [16] YU H X, ZHENG W S, WU A C, et al. Unsupervised Person Re-identification by Soft Multilabel Learning// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 2143-2152. [17] LI M K, LI C G, GUO J.Cluster-Guided Asymmetric Contrastive Learning for Unsupervised Person Re-identification. IEEE Transaction on Image Processing, 2022, 31: 3606-3617. [18] ZHANG P, XU J S, WU Q, et al. Learning Spatial-Temporal Representations over Walking Tracklet for Long-Term Person Re-identification in the Wild. IEEE Transactions on Multimedia, 2021, 23: 3562-3576. [19] WU A C, ZHENG W S, YU H X, et al. RGB-Infrared Cross-Modality Person Re-identification// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5390-5399. [20] YE M, LAN X Y, LI J W, et al. Hierarchical Discriminative Learning for Visible Thermal Person Re-identification// Proc of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 7501-7508. [21] YE M, WANG Z, LAN X Y, et al. Visible Thermal Person Re-identification via Dual-Constrained Top-Ranking// Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 1092-1099. [22] HAO Y, WANG N N, LI J, et al. HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-identification// Proc of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2019: 8385-8392. [23] ZHU Y X, YANG Z, WANG L, et al. Hetero-Center Loss for Cross-Modality Person Re-identification. Neurocomputing, 2020, 386: 97-109. [24] LIU H J, CHENG J, WANG W, et al. Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-identification. Neurocomputing, 2020, 398: 11-19. [25] LU Y, WU Y, LIU B, et al. Cross-Modality Person Re-identification with Shared-Specific Feature Transfer// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 13376-13386. [26] LI D G, WEI X, HONG X P, et al. Infrared-Visible Cross-Modal Person Re-identification with an X Modality// Proc of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI, 2020: 4610-4617. [27] YE M, SHEN J B, SHAO L.Visible-Infrared Person Re-identification via Homogeneous Augmented Tri-modal Learning. IEEE Transactions on Information Forensics and Security, 2021, 16: 728-739. [28] XIAO T, LI S, WANG B C, et al. Joint Detection and Identification Feature Learning for Person Search// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3376-3385. [29] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735. [30] BACHMAN P, HJELM R D, BUCHWALTER W.Learning Representations by Maximizing Mutual Information across Views// Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 15535-15545. [31] NGUYEN D T, HONG H G, KIM K W, et al. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors, 2017, 17(3). DOI: 10.3390/s17030605. [32] LUO H, GU Y Z, LIAO X Y, et al. Bag of Tricks and a Strong Baseline for Deep Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington, USA: IEEE, 2019: 1487-1495. [33] GRAY D, TAO H.Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features// Proc of the 10th European Conference on Computer Vision. Berlin, Germany: Springer, 2008: 262-275. [34] XIONG F, GOU M R, CAMPS O, et al. Person Re-identification Using Kernel-Based Metric Learning Methods// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 1-16. [35] MATSUKAWA T, OKABE T, SUZUKI E, et al. Hierarchical Gau-ssian Descriptor for Person Re-identification// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1363-1372. [36] SONG L C, WANG C, ZHANG L F, et al. Unsupervised Domain Adaptive Re-identification: Theory and Practice. Pattern Recognition, 2020. DOI: 10.1016/j.patcog.2019.107173. [37] ZHU J Y, PARK T, ISOLA P, et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks// Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2242-2251. [38] FU Y, WEI Y C, WANG G S, et al. Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6111-6120. [39] ZHONG Z, ZHENG L, LUO Z M, et al. Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 598-607. [40] DAI Z Z, WANG G Y, YUAN W H, et al. Cluster Contrast for Unsupervised Person Re-identification[C/OL].[2022-03-25]. https://arxiv.org/pdf/2103.11568.pdf. [41] CHEN H, LAGADEC B, BRÉMOND F. Enhancing Diversity in Teacher-Student Networks via Asymmetric Branches for Unsupervised Person Re-identification// Proc of the IEEE Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2021. DOI: 10.1109/WACV48630.2021.00005. [42] LIU X, SONG M L, TAO D C, et al. Semi-Supervised Coupled Dictionary Learning for Person Re-identification// Proc of the IEEE Conference on Computer Vision and Pattern Recognition.Wa-shington, USA: IEEE, 2014: 3550-3557. [43] DAI P Y, JI R R, WANG H B, et al. Cross-Modality Person Re-identification with Generative Adversarial Training// Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 677-683. [44] WANG Z X, WANG Z, ZHENG Y Q, et al. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 618-626. [45] YE M, LAN X Y, LENG Q M.Modality-Aware Collaborative Learning for Visible Thermal Person Re-identification// Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 347-355. [46] CHOI S, LEE S, KIM Y, et al. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-identification// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 10254-10263.