模式识别与人工智能
Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence
22 Judgement and Disposal of Academic Misconduct Article
22 Copyright Transfer Agreement
22 Proof of Confidentiality
22 Requirements for Electronic Version
More....
22 Chinese Association of Automation
22 National ResearchCenter for Intelligent Computing System
22 Institute of Intelligent Machines,Chinese Academy of Sciences
More....
 
 
2026 Vol.39 Issue.2, Published 2026-02-25

Papers and Reports    Researches and Applications   
   
Papers and Reports
97 Adversarial Example Generation Method Based on Semantic-Guided Local Perturbation Diffusion Model
ZHAO Hong, XU Mingting, LIU Ze
To address the issues of diffusion-based adversarial example generation methods(DiffAttack) in semantic guidance, salient regions, and image naturalness, an adversarial example generation method based on semantic-guided local perturbation diffusion model is proposed in this paper. First, a text embedding module is designed to iteratively optimize the text embedding before the denoising process of the diffusion model. The adversarial text embeddings used to guide semantic shifts are generated and adopted as the conditions for denoising. Second, a local mask fusion module is incorporated into the denoising process. The local perturbations are injected into salient regions in the latent space to enhance the attack effectiveness of the adversarial examples. Finally, a multi-level joint perceptual loss function is employed to jointly constrain perceptual differences at both the image and latent space levels. The image naturalness is enhanced while the attack effectiveness of the adversarial examples is maintained. Adversarial examples are generated on the ImageNet-Compatible subset using Inception as a proxy model, and are evaluated across three different model architectures. The results show that, compared with DiffAttack, the proposed method reduces the average Top-1 accuracy by 2.8% while improving the FID(Fréchet Inception Distance) score by 0.4. These results demonstrate that the proposed method generates adversarial examples with both stronger attack effectiveness and enhanced image naturalness. The proposed method can better detect the issues in security and robustness of the model, exhibiting strong practical value.
2026 Vol. 39 (2): 97-111 [Abstract] ( 33 ) [HTML 1KB] [ PDF 1568KB] ( 30 )
112 Discriminative Representation and Adaptive Calibrated Inference for Cross-Domain Few-Shot Named Entity Recognition
QIU Quanan, HUANG Qi, TONG Zirong, LUO Wenbing, YI Jie, WANG Mingwen
To address the challenges of boundary ambiguity and error accumulation caused by feature distribution shifts between source and target domains in few-shot Named Entity Recognition(NER), a model of cross-domain few-shot NER via discriminative representation and adaptive calibrated inference(DR-ACI) is proposed. First, the span detection space is reshaped through an asymmetric boundary contrastive(ABC) loss. An entity-centric asymmetric constraint strategy is adopted. With this strategy, entity boundaries are explicitly sharpened while the semantic diversity of the background is preserved. Simultaneously, an adaptive gated enhancement(AGE) module is introduced to dynamically calibrate sparse prototypes through multi-level semantic fusion, thereby mitigating representation uncertainty and bias resulting from support set sparsity. Subsequently, a scenario-aware adaptive calibrated inference mechanism is designed to tackle the bottlenecks of feature norm drift and support set bias. By leveraging feature normalization and a reliability-aware dual-mode gated strategy, the above mechanism dynamically reconstructs decision boundaries to suppress transfer noise. Experimental results demonstrate that DR-ACI maintains competitive performance on Few-NERD dataset and is superior to the baseline models on cross-domain datasets. These results verify the effectiveness of the synergistic optimization of discriminative representation and adaptive inference.
2026 Vol. 39 (2): 112-126 [Abstract] ( 21 ) [HTML 1KB] [ PDF 1042KB] ( 14 )
127 Deepfake Detection Method Combining Multi-stage Feature Disentanglement and Frequency-Domain Information
LIN Liwei, LI Yang, ZHU Hengliang, WANG Mengqiang, HUANG Chuan, CHEN Jianwei, ZHANG Jing, CHEN Bixia
Deepfake detection is faced with significant challenges due to its limited generalization capability and poor adaptability to unseen forgery techniques. To address these issues, a deepfake detection method combining multi-stage feature disentanglement and frequency-domain information(MFD-FD) is proposed. First, a hierarchical feature disentanglement strategy is designed. By introducing a forgery suppression loss and a reconstruction loss, content features are progressively separated from artifact features from shallow to deep layers. Thus, the coupling between the two features is effectively reduced with critical information preserved and the model can focus on more purified artifact representations. Next, frequency domain information is introduced to compensate for the deficiency of spatial features in spectral information, thereby enhancing the detection stability of the model against perturbations such as image compression. Finally, a frequency-domain fusion data augmentation method based on a cosine transition mask is presented to enhance the model robustness by synthesizing diverse forged samples. Extensive experiments demonstrate that MFD-FD outperforms the state-of-the-art methods in both generalization and robustness.
2026 Vol. 39 (2): 127-140 [Abstract] ( 18 ) [HTML 1KB] [ PDF 2287KB] ( 16 )
Researches and Applications
141 Beat-Aware Dance Generation Model Integrating Mamba-Transformer
HU Zhengping, XU Chuanxin, DONG Xiaoyun, WU Yifan
To address the challenge of simultaneously balancing both dance motion quality and beat alignment in audio-driven dance generation tasks, a beat-aware dance generation model integrating Mamba-Transformer(BeatDG) is proposed. First, an upper and lower limb motion feature encoding network is designed to autonomously learn a codebook of meaningful dance units in an unsupervised manner. Second, a beat feature extraction module is designed to effectively enhance music beat extraction capability. Therefore, the computational efficiency is ensured while the temporal modeling between music beats and dance motions is taken into account. On the basis of the above, a rhythm-gated temporal causal attention module is constructed to facilitate information interaction between music signals and upper and lower limb features. Finally, a hybrid generative architecture based on Dance Mamba and Transformer layers is designed to simultaneously consider continuous inter-frame features and global context. In this architecture, body and music binformation are fused and dance motions bconforming to spatial norms and paradigms are generated. Experiments on the AIST++ dataset demonstrate that BeatDG effectively improves the alignment between music beats and dance motions and ensures the quality of the generated dance.
2026 Vol. 39 (2): 141-156 [Abstract] ( 27 ) [HTML 1KB] [ PDF 2958KB] ( 18 )
157 Toward Extreme Image Compression with One-Step Diffusion and Quantization Semantics
ZHANG Zhouhong, QIAO Xin, LI Zhiyuan, AN Ning, KONG He
Diffusion-based extreme image compression methods exhibit significant performance advantages under extremely low bitrate scenarios. However, existing methods typically decode the image after multiple sampling steps due to their reliance on the step-by-step denoising strategy of diffusion models, resulting in a trade-off between reconstruction fidelity and inference efficiency. To address this issue, a toward extreme image compression method with one-step diffusion and quantization semantics is proposed. A one-step diffusion strategy is designed. It starts from compressed latent features rather than pure noise. High-quality image reconstruction is achieved with only a single sampling step. Moreover, quantized contrastive language-image pretraining(CLIP) features are introduced to replace text as semantic conditions, providing more fine-grained and reliable semantic guidance for reconstruction. Finally, a pixel-level loss is added to the training to alleviate the distribution discrepancy caused by optimization in the latent feature space, further improving reconstruction quality. Extensive experiments demonstrate that the proposed method achieves superior reconstruction quality with only a single sampling step.
2026 Vol. 39 (2): 157-169 [Abstract] ( 17 ) [HTML 1KB] [ PDF 3057KB] ( 19 )
170 Conversational Question Answering Based on Knowledge Graph and Coreference Resolution
WANG Jiahui, ZHAO Linchao, YIN Zhaorui, YUE Kun, CHEN Xingtong, DUAN Liang
There are two urgent challenges in conversational question answering to be addressed at present. One is how coreference and long range dependencies can be resolved to effectively utilize dependency information. The other is how contextual query subgraphs can be effectively maintained to avoid the risk of excessive expansion, thereby enabling more precise answer retrieval within them. In this paper, a model of conversational question answering based on knowledge graph and coreference resolution is proposed. First, coreference resolution is employed to obtain coreference clusters and an index replacement algorithm is introduced to enhance the semantic information of questions. Additionally, two types of dependency information, word coreference structure and character semantics, are proposed to guide the expansion of contextual query subgraph and answer retrieval. The contextual query subgraph is effectively expanded based on dependency information to obtain accurate query subgraph while avoiding overgrowth. Then, a reward-and-punishment mechanism is designed based on the number of dialogue rounds and the size of the query subgraph to effectively prevent the subgraph from overgrowing. Finally, dependency information is utilized to effectively improve the accuracy of answer retrieval. Experiments on the ConvQuestions dataset verify the effectiveness of the proposed method.
2026 Vol. 39 (2): 170-182 [Abstract] ( 19 ) [HTML 1KB] [ PDF 912KB] ( 24 )
183 Visible-Light Remote Sensing Image Super-Resolution Network Based on Space-Frequency Alternating Self-Attention
LIU Jie, CHENG Liming
Super-resolution reconstruction of visible-light remote sensing images requires collaborative optimization of local texture recovery and long-range structural consistency. Although traditional Transformer networks can model long-range dependencies, they lack sufficient sensitivity to high-frequency textures. To address this issue, a visible-light remote sensing image super-resolution network based on space-frequency alternating self-attention(SFASR) is proposed. Local textures and cross-regional long-range dependencies are modeled respectively through serially alternating frequency domain and spatial domain self-attention. Specifically, a phase-aware frequency self-attention mechanism is designed to enable frequency domain self-attention computation, explicitly modeling phase differences for enhanced high-frequency texture reconstruction. Furthermore, a channel-enhanced permutation self-attention mechanism is constructed to implement spatial domain self-attention computation. By incorporating channel attention, the mechanism strengthens feature representation and global structural consistency. Experimental results show that SFASR effectively addresses the issues of high-frequency information loss and structural breakage, and improves image reconstruction quality.
2026 Vol. 39 (2): 183-192 [Abstract] ( 26 ) [HTML 1KB] [ PDF 1623KB] ( 21 )
模式识别与人工智能
 

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press
 
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn