模式识别与人工智能
Saturday, May. 3, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2024, Vol. 37 Issue (11): 947-959    DOI: 10.16451/j.cnki.issn1003-6059.202411001
Object Recognition and Tracking Orienting Computer Vision Current Issue| Next Issue| Archive| Adv Search |
Scene Graph Knowledge Based Text-to-Image Person Re-identification
WANG Jinxi1, LU Mingming1
1. School of Computer Science and Engineering, Central South University, Changsha 410083

Download: PDF (1137 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  :Most existing text-to-image person re-identification methods adapt to person re-identification tasks and obtain strong visual language joint representation capabilities of pre-trained models by fine-tuning visual language models, such as contrastive language-image pretraining(CLIP). These methods only consider the task adaptation for downstream re-identification task, but they ignore the required data adaptation due to data differences and it is still difficult for them to effectively capture structured knowledge, such as understanding object attributes and relationships between objects. To solve these problems, a scene graph knowledge based text-to-image person re-identification method is proposed. A two-stage training strategy is employed. In the first stage, the image encoder and the text encoder of CLIP model are frozen. Prompt learning is utilized to optimize the learnable prompt tokens to make the downstream data domain adapt to the original training data domain of CLIP model. Thus, the domain adaptation problem is effectively solved. In the second stage, while fine-tuning CLIP model, semantic negative sampling and scene graph encoder modules are introduced. First, difficult samples with similar semantics are generated by scene graph, and the triplet loss is introduced as an additional optimization target. Second, the scene graph encoder is introduced to take the scene graph as input, enhancing CLIP ability to acquire structured knowledge in the second stage. The effectiveness of the proposed method is verified on three widely used datasets.
Key wordsScene Graph      Prompt Learning      Text-to-Image Person Re-identification(T2IReID)      Contrastive Language-Image Pretraining(CLIP)     
Received: 15 July 2024     
ZTFLH: TP 391.41  
Fund:Supported by National Natural Science Foundation of China(No.U20A20182)
Corresponding Authors: LU Mingming, Ph.D., associate professor. His research interests include pattern recognition, deep learning and computer vision.   
About author:: WANG Jinxi, Master student. His research interests include deep learning and computer vision.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WANG Jinxi
LU Mingming
Cite this article:   
WANG Jinxi,LU Mingming. Scene Graph Knowledge Based Text-to-Image Person Re-identification[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(11): 947-959.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202411001      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I11/947
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn