模式识别与人工智能
Saturday, May. 3, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence
22 Judgement and Disposal of Academic Misconduct Article
22 Copyright Transfer Agreement
22 Proof of Confidentiality
22 Requirements for Electronic Version
More....
22 Chinese Association of Automation
22 National ResearchCenter for Intelligent Computing System
22 Institute of Intelligent Machines,Chinese Academy of Sciences
More....
 
 
2024 Vol.37 Issue.11, Published 2024-11-25

Researches and Applications    Object Recognition and Tracking Orienting Computer Vision   
   
Object Recognition and Tracking Orienting Computer Vision
947 Scene Graph Knowledge Based Text-to-Image Person Re-identification
WANG Jinxi, LU Mingming
:Most existing text-to-image person re-identification methods adapt to person re-identification tasks and obtain strong visual language joint representation capabilities of pre-trained models by fine-tuning visual language models, such as contrastive language-image pretraining(CLIP). These methods only consider the task adaptation for downstream re-identification task, but they ignore the required data adaptation due to data differences and it is still difficult for them to effectively capture structured knowledge, such as understanding object attributes and relationships between objects. To solve these problems, a scene graph knowledge based text-to-image person re-identification method is proposed. A two-stage training strategy is employed. In the first stage, the image encoder and the text encoder of CLIP model are frozen. Prompt learning is utilized to optimize the learnable prompt tokens to make the downstream data domain adapt to the original training data domain of CLIP model. Thus, the domain adaptation problem is effectively solved. In the second stage, while fine-tuning CLIP model, semantic negative sampling and scene graph encoder modules are introduced. First, difficult samples with similar semantics are generated by scene graph, and the triplet loss is introduced as an additional optimization target. Second, the scene graph encoder is introduced to take the scene graph as input, enhancing CLIP ability to acquire structured knowledge in the second stage. The effectiveness of the proposed method is verified on three widely used datasets.
2024 Vol. 37 (11): 947-959 [Abstract] ( 228 ) [HTML 1KB] [ PDF 1137KB] ( 323 )
960 Clothes-Changing Person Re-identification Method Based on Text-Image Mutual Learning
GE Bin, LU Yang, XIA Chenxing, GUAN Junming
To address the issue of low recognition accuracy in pedestrian re-identification(Re-ID) tasks involving clothing changes, a method for clothes-changing person re-identification based on text-image mutual learning(TIML) is proposed. It leverages the ability of contrastive language-image pre-training to generate pseudo-texts. In the first training phase, a pseudo-text generator is designed to enhance text diversity by swapping pixel information among samples within the same batch, thereby augmenting text variability. Additionally, a semantic alignment loss LSA is introduced to ensure the consistency in text feature representation. In the second phase of training, a global and local fusion network is devised to bolster the discriminative power of visual features by fusing local and global features, guided by the textual information obtained in the first phase. Experiments on PRCC, Celeb-ReID, Celeb-Light and VC-Clothes datasets demonstrate that the proposed model significantly improves recognition accuracy in scenarios with small dataset samples.
2024 Vol. 37 (11): 960-973 [Abstract] ( 96 ) [HTML 1KB] [ PDF 4141KB] ( 230 )
974 Anchor-Free RepPoints and Attention Mechanism Based Adaptive Siamese Network for Object Tracking
YUAN Shuai, DOU Huize, GENG Jinyu, LUAN Fangjun
The high computational complexity of current Siamese network based target tracking algorithm during the candidate box generation stage results in poor real-time performance and reduced accuracy in complex scenarios. To address these issues, an anchor-free RepPoints and attention mechanism based adaptive Siamese network for object tracking is proposed. First, a large-kernel convolutional attention module is introduced in the backbone network of the Siamese subnetwork to extract global features of the target, enhancing the precision and generalization ability of the model. Second, a module for anchor-free multi-RepPoints is utilized to learn multiple RepPoints of the target, and then an adaptive learning weight coefficient module is employed to filter out more accurate target RepPoints, further improving model precision and robustness. Finally, RepPoints are transformed into predicted boxes, thereby eliminating the need for predefined candidate boxes, reducing computational complexity and enhancing real-time tracking performance. Experiments indicate that the proposed method achieves significant improvements in precision and success rate on four datasets.
2024 Vol. 37 (11): 974-985 [Abstract] ( 100 ) [HTML 1KB] [ PDF 1679KB] ( 247 )
986 Involutional Capsule Network for Dermoscopy Image Recognition
WANG Lingxiang, ZHANG Li
Dermoscopy image recognition can distinguish skin lesions and it is helpful for the early diagnosis of skin cancer. To enhance the efficiency of dermoscopy image recognition, an involutional capsule network(InvCNet) is proposed. InvCNet combines an involutional operation and a global attention mechanism(GAM), while the reconstruction part is removed. The involution operation provides rich minutiae to enhance the dermoscopy image features by fusing information of feature maps across channels. Meanwhile, GAM is employed to mitigate the loss of spatial information induced by the convolution and pooling operations and amplify the cross-dimensional interactions. Experiments on four public datasets demonstrate that InvCNet significantly reduces the number of network parameters while achieving superior performance on most datasets.
2024 Vol. 37 (11): 986-998 [Abstract] ( 64 ) [HTML 1KB] [ PDF 968KB] ( 244 )
Researches and Applications
999 Domain Machine Translation Method with Dynamic Incorporation of k-Nearest Neighbor Knowledge
HUANG Yuxin, SHEN Tao, JIANG Shuting, ZENG Hao, LAI Hua
Domain machine translation methods based on k-nearest neighbour retrieval improve translation quality by incorporating translation knowledge retrieved from a translation knowledge base. Existing methods enhance translation performance by fusing the decoder prediction distribution with k-nearest neighbour knowledge. However, the inaccuracy of the retrieved k-nearest neighbor knowledge may interfere with the prediction results of the model. To address this issue, a domain machine translation method with dynamic incorporation of k-nearest neighbor knowledge is proposed. The confidence of the decoder output distribution is first assessed. With the combination of gating mechanism, the proposed method dynamically decides whether to incorporate the k-nearest-neighbor retrieval results, thereby adjusting the degree of incorporation of k-nearest neighbor knowledge flexibly. The adaptive k-value module is introduced to reduce the interference caused by incorrect k-nearest neighbor knowledge. Besides, the distribution-guided loss is designed to steer the model output approach the target distribution gradually. On four domain-specific German-English machine translation datasets, the proposed method achieves improvements.
2024 Vol. 37 (11): 999-1009 [Abstract] ( 96 ) [HTML 1KB] [ PDF 909KB] ( 231 )
1010 Diffusion Models Based Unconditional Counterfactual Explanations Generation
ZHONG Zhi, WANG Yu, ZHU Ziye, LI Yun
Counterfactual explanations alter the model output by implementing minimal and interpretable modifications to input data, revealing key factors influencing model decisions. Existing counterfactual explanation methods based on diffusion models rely on conditional generation, requiring additional semantic information related to classification. However, ensuring semantic quality of the semantic information is challenging and computational costs are increased. To address these issues, an unconditional counterfactual explanation generation method based on the denoising diffusion implicit model(DDIM)is proposed. By leveraging the consistency exhibited by DDIM during the reverse denoising process, noisy images are treated as latent variables to control the generated outputs, thus making the diffusion model suitable for unconditional counterfactual explanation generation workflows. Then, the advantages of DDIM in filtering high-frequency noise and out-of-distribution perturbations are fully utilized, thereby reconstructing the unconditional counterfactual explanation workflow to generate semantically interpretable modifications. Extensive experiments on different datasets demonstrate that the proposed method achieves superior results across multiple metrics.
2024 Vol. 37 (11): 1010-1021 [Abstract] ( 90 ) [HTML 1KB] [ PDF 4222KB] ( 222 )
1022 Offline Reinforcement Learning Algorithm Based on Selection of High-Quality Samples
HOU Yonghong, DING Wang, REN Yi, DONG Hongwei, YANG Songling
To address the issue of over-reliance on the quality of dataset samples of offline reinforcement learning algorithms, an offline reinforcement learning algorithm based on selection of high-quality samples(SHS) is proposed. In the policy evaluation stage, higher update weights are assigned to the samples with advantage values, and a policy entropy term is added to quickly identify high-quality action samples with high probability within the data distribution, thereby screening out more valuable action samples. In the policy optimization stage, SHS aims to maximize the normalized advantage function while maintaining the policy constraints on the actions within the dataset. Consequently, high-quality samples can be efficiently utilized when the sample quality of the dataset is low, thereby improving the learning efficiency and performance of the strategy. Experiments show that SHS performs well on D4RL offline dataset in the MuJoCo-Gym environment and successfully screens out more valuable samples, thus its effectiveness is verified.
2024 Vol. 37 (11): 1022-1032 [Abstract] ( 94 ) [HTML 1KB] [ PDF 1530KB] ( 263 )
模式识别与人工智能
 

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press
 
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn