Abstract:In the existing cross-modal person re-identification methods, modal differences are lessened by aligning features or pixel distributions of different modalities. However, the discriminative fine-grained information of pedestrians is ignored in these methods. To obtain more discriminative pedestrian features independent of modal differences, a modal invariance feature learning and consistent fine-grained information mining based cross-modal person re-identification method is proposed. The proposed method is mainly composed of two modules, modal invariance feature learning and semantically consistent fine-grained information mining. The two modules are combined to drive the feature extraction network to obtain discriminative features. Specifically, the modal invariant feature learning module is utilized to remove the modal information from the feature map to reduce the modal differences. Channel grouping and horizontal segmentation are conducted on person feature maps via the semantic consistent fine-grained information mining module. Consequently, the semantic alignment is achieved and the discriminative fine-grained information is fully mined. Experimental results show that the performance of the proposed method is significantly improved compared with the state-of-the-art cross-modal person re-identification methods.
[1] GONG S G, CRISTANI M, LOY C C, et al. The Re-identification Challenge // GONG S G, CRISTANI M, YAN S C, et al., eds. Person Re-identification. Berlin, Germany: Springer, 2014: 1-20. [2] WANG J Y, ZHU X T, GONG S G, et al. Transferable Joint Attri-bute-Identity Deep Learning for Unsupervised Person Re-identification // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 2275-2284. [3] SONG J F, YANG Y X, SONG Y Z, et al. Generalizable Person Re-identification by Domain-Invariant Mapping Network // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 719-728. [4] JIN X, LAN C L, ZENG W J, et al. Style Normalization and Restitution for Generalizable Person Re-identification // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 3140-3149. [5] 李玲莉,谢明鸿,李凡,等.低秩先验引导的无监督域自适应行人重识别.重庆大学学报, 2021, 44(11): 57-70. (LI L L, XIE M H, LI F, et al. Unsupervised Domain Adaptive Person Re-identification Guided by Low-Rank Priori. Journal of Chongqing University, 2021, 44(11): 57-70.) [6] 郑爱华,曾小强,江波,等.基于局部异质协同双路网络的跨模态行人重识别.模式识别与人工智能, 2020, 33(10): 867-878. (ZHENG A H, ZENG X Q, JIANG B, et al. Cross-Modal Person Re-identification Based on Local Heterogeneous Collaborative Dual-Path Network. Pattern Recognition and Artificial Intelligence, 2020, 33(10): 867-878.) [7] 张磊,吴晓富,张索非,等.基于多分支协作的行人重识别网络.模式识别与人工智能, 2021, 34(9): 853-862. (ZHANG L, WU X F, ZHANG S F, et al. Multi-branch Cooperative Network for Person Re-identification. Pattern Recognition and Artificial Intelligence, 2021, 34(9): 853-862.) [8] WANG Z X, WANG Z, ZHENG Y Q, et al. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-identification // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2019: 618-626. [9] WANG G A, YANG Y, ZHANG T Z, et al. Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12144-12151. [10] WANG G A, ZHANG T Z, CHENG J, et al. RGB-Infrared Cross-Modality Person Re-identification via Joint Pixel and Feature Alignment // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 3622-3631. [11] FAN X, JIANG W, LUO H, et al. Modality-Transfer Generative Adversarial Network and Dual-Level Unified Latent Representation for Visible Thermal Person Re-identification. The Visual Computer(International Journal of Computer Graphics), 2022, 38(1): 279-294. [12] ZHU J Y, PARK T, ISOLA P, et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2242-2251. [13] LIU H J, MA S, XIA D X, et al. SFANet: A Spectrum-Aware Feature Augmentation Network for Visible-Infrared Person Reidentification. IEEE Transactions on Neural Networks and Learning Systems, 2021. DOI: 10.1109/TNNLS.2021.3105702. [14] LI D G, WEI X, HONG X P, et al. Infrared-Visible Cross-Modal Person Re-identification with an X Modality. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4610-4617. [15] WANG Z J, LIU L, ZHANG H X. Dual-Path Image Pair Joint Discrimination for Visible-Infrared Person Re-identification. Journal of Visual Communication and Image Representation, 2022, 85. DOI: 10.1016/j.jvcir.2022.103512. [16] GAO G W, SHAO H, WU F, et al. Leaning Compact and Representative Features for Cross-Modality Person Re-identification. World Wide Web, 2022, 25(4): 1649-1666. [17] WANG C D, ZHANG C, FENG Y J, et al. Learning Visible Thermal Person Re-identification via Spatial Dependence and Dual-Constraint Loss. Entropy, 2022, 24(4). DOI: 10.3390/e24040443. [18] HU W P, LIU B H, ZENG H T, et al. Adversarial Decoupling and Modality-Invariant Representation Learning for Visible-Infrared Person Re-identification. IEEE Transaction on Circuits and Systems for Video Technology, 2022, 32(8): 5095-5109. [19] HAO Y, WANG N N, GAO X B, et al. Dual-Alignment Feature Embedding for Cross-Modality Person Re-identification // Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 57-65. [20] PARK H, LEE S, LEE J, et al. Learning by Aligning: Visible-Infrared Person Re-identification Using Cross-Modal Correspondences // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 12026-12035. [21] CHEN Y, WAN L, LI Z H, et al. Neural Feature Search for RGB-Infrared Person Re-identification // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 587-597. [22] WU A C, ZHENG W S, YU H X, et al. RGB-Infrared Cross-Modality Person Re-identification // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5390-5399. [23] YE M, SHEN J B, LIN G J, et al. Deep Learning for Person Re-identification: A Survey and Outlook. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2022, 44(6): 2872-2893. [24] YE M, SHEN J B, CRANDALL D J, et al. Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 229-247. [25] YE M, LAN X Y, WANG Z, et al. Bi-directional Center-Constrained Top-Ranking for Visible Thermal Person Re-identification. IEEE Transactions on Information Forensics and Security, 2020, 15: 407-419. [26] LIU H J, CHENG J, WANG W, et al. Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-identification. Neurocomputing, 2020, 398: 11-19. [27] SU C, LI J N, ZHANG S L, et al. Pose-Driven Deep Convolutio-nal Model for Person Re-identification // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 3980-3989. [28] ZHENG L, HUANG Y J, LU H C, et al. Pose-Invariant Embe-dding for Deep Person Re-identification. IEEE Transactions on Image Processing, 2019, 28(9): 4500-4509. [29] TAY C P, ROY S, YAP K H. AANet: Attribute Attention Network for Person Re-identifications // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 7127-7136. [30] WANG G A, YANG S, LIU H Y, et al. High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-identification // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 6449-6458. [31] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2022-06-04]. https://arxiv.org/pdf/2010.11929.pdf. [32] RADENOVIĆ F, TOLIAS G, CHUM O. Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2019, 41(7): 1655-1668. [33] ZHU Y X, YANG Z, WANG L, et al. Hetero-Center Loss for Cross-Modality Person Re-identification. Neurocomputing, 2020, 386: 97-109. [34] NGUYEN D T, HONG H G, KIM K W, et al. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors(Basel), 2017, 17(3). DOI: 10.3390/s17030605. [35] YE M, WANG Z, LAN X Y, et al. Visible Thermal Person Re-identification via Dual-Constrained Top-Ranking // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 1092-1099. [36] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [37] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [38] CHOI S, LEE S, KIM Y, et al. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-identification // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2020: 10254-10263. [39] DAI P Y, JI R R, WANG H B, et al. Cross-Modality Person Re-identification with Generative Adversarial Training // Proc of the 27th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2018: 677-683. [40] ZHANG Q, LAI J H, XIE X H. Learning Modal-Invariant Angular Metric by Cyclic Projection Network for VIS-NIR Person Re-identification. IEEE Transactions on Image Processing, 2021, 30: 8019-8033. [41] ZHAO J Q, WANG H Z, ZHOU Y, et al. Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-identification. IEEE Transactions on Multimedia, 2022. DOI: 10.1109/TMM.2022.3163847. [42] WEI Z Y, YANG X, WANG N N, et al. Flexible Body Partition-Based Adversarial Learning for Visible Infrared Person Re-identification. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 4676-4687. [43] GAO W B, LIU L, ZHU L, et al. Visible-Infrared Person Re-identification Based on Key-Point Feature Extraction and Optimization. Journal of Visual Communication and Image Representation, 2022, 85. DOI: 10.1016/j.jvcir.2022.103511. [44] LI K F, WANG X L, LIU Y, et al. Cross-Modality Disentanglement and Shared Feedback Learning for Infrared-Visible Person Re-identification. Knowledge-Based Systems, 2022, 252. DOI: 10.1016/j.knosys.2022.109337. [45] ZHANG D M, ZHANG Z Z, JU Y, et al. Dual Mutual Learning for Cross-Modality Person Re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(8): 5361-5373. [46] SUN Y F, ZHENG L, YANG Y, et al. Beyond Part Models: Person Retrieval with Refined Part Pooling(and a Strong Convolutional Baseline) // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 501-518. [47] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Loca-lization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626. [48] LI H F, CHEN Y W, TAO D P, et al. Attribute-Aligned Domain-Invariant Feature Learning for Unsupervised Domain Adaptation Person Re-identification. IEEE Transactions on Information Forensics and Security, 2020, 16: 1480-1494. [49] YAO H T, ZHANG S L, HONG R C, et al. Deep Representation Learning with Part Loss for Person Re-identification.IEEE Transac-tions on Image Processing, 2019, 28(6): 2860-2871. [50] ZHANG S S, YANG J, SCHIELE B. Occluded Pedestrian Detection through Guided Attention in CNNs // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 6995-7003. [51] DING C X, WANG K, WANG P F, et al. Multi-task Learning with Coarse Priors for Robust Part-Aware Person Re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(3): 1474-1488.