Pattern Recognition and Artificial Intelligence

模式识别与人工智能

Wednesday, Aug. 13, 2025

Home About Journal Editorial Board Instructions Ethics Statement Contact Us 中文

	Judgement and Disposal of Academic Misconduct Article

	Copyright Transfer Agreement

	Proof of Confidentiality

	Requirements for Electronic Version

More....

	Chinese Association of Automation

	National ResearchCenter for Intelligent Computing System

	Institute of Intelligent Machines,Chinese Academy of Sciences

More....

	2024 Vol.37 Issue.5, Published 2024-05-25

	Papers and Reports Researches and Applications Object Detection, Recognition and Adversarial Defense

Object Detection, Recognition and Adversarial Defense

	383	Domain Adaptive Person Re-identification Model Based on Camera Perception
		YANG Zhangjing, WU Shuli, HUANG Pu, YANG Guowei
		To solve the problem of low performance in person re-identification caused by large distribution differences between the training and testing sets in corruption scenarios, high background complexity and excessive noise types, a domain adaptive person re-identification model based on camera perception is proposed. The model aligns the image distribution of different cameras during the training phase by introducing and fully utilizing camera information. During the testing phase, temporal information is employed for ranking optimization, reducing the impact of distribution differences between the training and testing sets. The issues of background complexity and noise types are effectively addressed. The model not only effectively mitigates the impact of damaged images from the perspective of dataset processing but also significantly improves the performance of the model in corruption scenarios through quadratic weighting of sorting optimization. Experiments on Market-1501, DukeMTMC-reID and CUHK03 datasets demonstrate the effectiveness of the proposed algorithm.
		2024 Vol. 37 (5): 383-397 [Abstract] ( 502 ) [HTML 1KB] [ PDF 1359KB] ( 772 )

	398	Multi-consistency Constrained Semi-supervised Video Action Detection Based on Feature Enhancement and Residual Reshaping
		HU Zhengping, ZHANG Qiming, WANG Yulu, ZHANG Hehao, DI Jirui
		The feature representations of both original data and augmented data in the consistency regularized semi-supervised video action detection method tend to induce discriminative domain bias between two types of data, thereby resulting in inadequate fitting of the discriminative results. To address this issue, a multi-consistency constrained semi-supervised video action detection method based on feature enhancement and residual reshaping is proposed in this paper. Firstly, the basic action feature descriptors are continuously enhanced and encoded in the spatiotemporal dimension to obtain crucial contextual information for video action understanding. Subsequently, a residual feature reshaping module is employed to obtain multi-scale residual information while reshaping the features. To reduce the discriminative bias between different types of data, multiple consistency constraints are applied to the original data and the augmented data from the perspectives of classification features and action localization features, achieving a match between discriminative results and feature representation of the augmented data and the original data. Experimental results on JHMDB-21 and UCF101-24 datasets demonstrate the effectiveness of the proposed method in improving video action detection accuracy under the condition of limited labeled samples and strong competitiveness.
		2024 Vol. 37 (5): 398-409 [Abstract] ( 299 ) [HTML 1KB] [ PDF 1239KB] ( 675 )

	410	Spatio-Temporal IoU Constraints Based Adversarial Defense Method for Object Tracking
		SHENG Jingjing, ZHANG Dawei, CAI Tingyi, XIAO Xin, ZHENG Zhonglong, JIANG Yunliang
		With the wide application of deep learning in the field of visual tracking, adversarial attack is one of key factors affecting the model performance. However, the research on defense methods for adversarial attack is still in the initial stage. Therefore, a spatio-temporal intersection over union(IoU) constraints based adversarial defense method for object tracking is proposed. In this method, Gaussian noise constraints are firstly added to the adversarial examples. Then, according to the tangent direction of the noise contour, the tangential constraint with the same noise level and the highest spatio-temporal IoU score is selected. The normal constraint is utilized to update the defense target towards the direction of the original image, and the normal and tangential constraints are orthogonally combined and optimized. Finally, the combined vector with the highest spatio-temporal IoU score and the lowest noise level is selected as the best constraint, and it is added to the adversarial example image and transferred to the next frame image, thereby realizing temporal defense. Experiments on VOT2018, OTB100, GOT-10k and LaSOT tracking datasets verify the validity of the proposed method.
		2024 Vol. 37 (5): 410-423 [Abstract] ( 407 ) [HTML 1KB] [ PDF 4648KB] ( 797 )

	424	Multi-level Fusion Based Weakly Supervised Object Detection Network
		CAO Huan, CHEN Zengping
		Due to the lack of precise bounding box annotations, weakly supervised object detectors rely on the pretrained image classification model to classify candidate regions. However, the pretrained model often produces high responses for discriminative regions rather than complete objects, resulting in the problems of part domination, instance missing and untight boxes. To address these issues, a multi-level fusion based weakly supervised object detection network is proposed. The detection performance is improved from the perspectives of enhancing the weak discriminative spatial feature learning, enriching intra-class sample features and weighting reliable pseudo-labels. Firstly, a power function is utilized to weight and fuse the activation values within the neighborhood by the power pooling layer to reduce information loss of weak discriminative features. Secondly, the feature vectors of candidate regions are randomly fused by the feature mixing method to enrich the diversity of training sample features. Finally, the confidence of predictions and pseudo-labels is fused via the confidence-based sample re-weighting strategy to adjust the influence of pseudo-labels on training. Experiments on three benchmarks demonstrate the superiority of the proposed network.
		2024 Vol. 37 (5): 424-434 [Abstract] ( 342 ) [HTML 1KB] [ PDF 3582KB] ( 855 )

Papers and Reports

	435	Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
		FANG Baofu, YU Tingting, WANG Hao, WANG Zaijun
		In multi-agent task scenarios, a large and diverse state space is often encountered. In some cases, the reward information provided by the external environment may be extremely limited, exhibiting sparse reward characteristics. Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios, as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process. To address this issue, a multi-agent reinforcement learning algorithm based on state space exploration(MASSE) in sparse reward scenarios is proposed. MASSE constructs a subset space of states, maps one state from this subset, and takes it as an intrinsic goal, enabling agents to more fully utilize the state space and reduce unnecessary exploration. The agent states are decomposed into self-states and environmental states, and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals. By constructing a state subset space and generating intrinsic rewards based on mutual information, the states close to the target states and the states understanding the environment are rewarded appropriately. Consequently, agents are motivated to move more actively towards the goal while enhancing their understanding of the environment, guiding them to flexibly adapt to sparse reward scenarios. The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.
		2024 Vol. 37 (5): 435-446 [Abstract] ( 381 ) [HTML 1KB] [ PDF 2799KB] ( 798 )

	447	Personalized Federated Learning Based on Sparsity Regularized Bi-level Optimization
		LIU Xi, LIU Bo, JI Fanfan, YUAN Xiaotong
		Personalized federated learning focuses on providing personalized model for each client, aiming to improve the processing performance on statistically heterogeneous data. However, most existing personalized federated learning algorithms enhance the performance of personalized models at the cost of increasing the number of client parameters and making computation more complex. To address this issue, a personalized federated learning algorithm based on sparsity regularized bi-level optimization(pFedSRB) is proposed in this paper. The l₁ norm sparse regularization is introduced into the personalized update of each client to enhance the sparsity of the personalized model, avoid unnecessary parameter updates of clients, and reduce model complexity. The personalized federated learning problem is formulated as a bi-level optimization problem, and the inner-level optimization of pFedSRB is solved by the alternating direction method of multipliers to improve the learning speed. Experiments on four federated learning benchmark datasets demonstrate that pFedSRB performs well on heterogeneous data , effectively improving model performance while reducing the time and memory costs required for training.
		2024 Vol. 37 (5): 447-458 [Abstract] ( 402 ) [HTML 1KB] [ PDF 1319KB] ( 830 )

Researches and Applications

	459	Cross-Modal Multi-level Fusion Sentiment Analysis Method Based on Visual Language Model
		XIE Runfeng, ZHANG Bochao, DU Yongping
		Image-text multimodal sentiment analysis aims to predict sentimental polarity by integrating visual modalities and text modalities. The key to solving the multimodal sentiment analysis task is obtaining high-quality multimodal representations of both visual and textual modalities and achieving efficient fusion of these representations. Therefore, a cross-modal multi-level fusion sentiment analysis method based on visual language model(MFVL) is proposed. Firstly, based on the pre-trained visual language model, high-quality multimodal representations and modality bridge representations are generated by freezing the parameters and a low-rank adaptation method being adopted for fine-tuning the large language model. Secondly, a cross-modal multi-head co-attention fusion module is designed to perform interactive weighted fusion of the visual and textual modality representations respectively. Finally, a mixture of experts module is designed to deeply fuse the visual, textual and modality bridging representations to achieve multimodal sentiment analysis. Experimental results indicate that MFVL achieves state-of-the-art performance on the public evaluation datasets MVSA-Single and HFM.
		2024 Vol. 37 (5): 459-468 [Abstract] ( 427 ) [HTML 1KB] [ PDF 1078KB] ( 921 )

	469	Knowledge Discovery and Rule Extraction Based on Heterogeneous Network Linguistic Formal Context
		SHA Liwei, YANG Zheng, LIU Hongping, ZOU Li
		One of the research hotspots is how to handle data with complex relationships under the uncertainty environment. The network formal context combines complex network analysis and formal concept analysis to provide an effective mathematical tool for knowledge discovery of complex relational data. In this paper, the heterogeneous network linguistic formal context is firstly proposed based on the heterogeneity of network structure. The heterogeneous network contains a subjective network given by experts and an objective network mined by the features of objects. Then, global and local heterogeneous network language concepts are obtained by considering the connectivity of the network, and the algorithms for global and local connectivity knowledge discovery in heterogeneous networks are provided. Finally, an association rule extraction model is constructed based on the heterogeneous network linguistic formal context, and the rationality and effectiveness of knowledge discovery and rule extraction are verified by examples.
		2024 Vol. 37 (5): 469-478 [Abstract] ( 337 ) [HTML 1KB] [ PDF 723KB] ( 874 )

模式识别与人工智能

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press

Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech 　Email:support@magtech.com.cn