Pattern Recognition and Artificial Intelligence

模式识别与人工智能

Home About Journal Editorial Board Instructions Ethics Statement Contact Us 中文

	Judgement and Disposal of Academic Misconduct Article

	Copyright Transfer Agreement

	Proof of Confidentiality

	Requirements for Electronic Version

More....

	Chinese Association of Automation

	National ResearchCenter for Intelligent Computing System

	Institute of Intelligent Machines,Chinese Academy of Sciences

More....

	2024 Vol.37 Issue.9, Published 2024-09-25

	Papers and Reports Researches and Applications

Papers and Reports

	755	Enhanced Target Domain Representation BasedUnsupervised Cross-Domain Medical Image Segmentation
		LIU Kai, LU Runuo, ZHENG Xiaorou, DONG Shoubin
		Medical images produced by different imaging modality devices exhibit varying degrees of distribution differences. Unsupervised domain adaptation methods typically aim to generalize models trained in the source domain to the unlabeled target domain by minimizing these distribution differences and using shared features between the source and target domains for result prediction. However, they often neglect the private features of the target domain. To address this issue, a method for enhanced target domain representation based unsupervised cross-domain medical image segmentation(TreUCMIS) is proposed in this paper. First, TreUCMIS acquires common features through shared feature learning, and a target domain feature encoder is trained through image reconstruction to capture the complete features of the target domain. Second, unsupervised self-training of the target domain strengthens the shared characteristics of deep and shallow features. Finally, the predicted results obtained from the shared and complete features are aligned, enabling the model to utilize the complete features of the target domain for segmentation and thus improving the generalization in the target domain. Experiments on two medical image segmentation datasets involving bidirectional domain adaptation tasks with CT and MRI(abdominal and cardiac datasets) demonstrate the effectiveness and superiority of TreUCMIS.
		2024 Vol. 37 (9): 755-769 [Abstract] ( 708 ) [HTML 1KB] [ PDF 2275KB] ( 660 )

	770	Traffic Scene Semantic Segmentation Algorithm with Knowledge Distillation of Multi-level Features Guided by Boundary Perception
		XIE Xinlin, DUAN Zeyun, LUO Chenyan, XIE Gang
		To solve the problems of object detail information loss and large model parameters in traffic scenes, a traffic scene semantic segmentation algorithm with knowledge distillation of multi-level features guided by boundary perception is proposed. The proposed algorithm can smooth the object segmentation boundaries with fewer parameters. First, the adaptive fusing multi-level feature module is constructed to integrate the multi-level features of deep semantic information and shallow spatial information. The object boundary information and object subject information are highlighted selectively. Second, an interactive attention fusion module is proposed to model the long-range dependencies in spatial and channel dimensions, enhancing the information interaction capabilities between different dimensions. Finally, a boundary loss function based on candidate boundaries is proposed to construct a boundary knowledge distillation network based on detail awareness and transfer boundary information from complex teacher networks. Experiments on the traffic scene datasets Cityscapes and CamVid demonstrate that the proposed algorithm achieves a lightweight model while gaining positive segmentation performance, maintaining significant advantages in dealing with small and slender objects.
		2024 Vol. 37 (9): 770-785 [Abstract] ( 403 ) [HTML 1KB] [ PDF 7543KB] ( 563 )

	786	Face Forgery Detection Combined with Deep Forgery Features Comparison
		LI Zhaowei, GAO Xinjian, DA Zikai, GAO Jun
		With the continuous development of artificial intelligence-generated content technology, the diversity of forgery techniques presents significant challenges to existing detection methods. Most current detection methods are based on facial forgery features extracted by different advanced convolutional neural networks. However, these methods are trained on datasets containing known forgery techniques, and their generalization capabilities are inadequate to handle images forged by unknown methods. Therefore, a face forgery detection method combined with deep forgery features comparison is proposed, and it exhibits excellent adaptability to unknown forgery techniques. The proposed approach consists of two stages. First, similar features of different forgery techniques are explored, and a meta-learning-based similar feature fusion network is introduced. This network leverages the learning capabilities of meta-learning to capture the similar features among different forgery methods. Second, unique forgery features specific to individual task are taken into account, and a task-specific uniqueness fine-tuning method is proposed to enhance the adaptability of the model to unknown forgery techniques.Cross-manipulation testing demonstrates that the proposed method improves the performance with superior detection capability against attacks from unknown forgery techniques.
		2024 Vol. 37 (9): 786-797 [Abstract] ( 618 ) [HTML 1KB] [ PDF 2751KB] ( 592 )

Researches and Applications

	798	Soft Prompt Learning with Internal Knowledge Expansion for Clickbait Detection
		DONG Bingbing, WU Xindong
		The main purpose of clickbait is to increase page views and advertising revenues by enticing users to click on bait links. The content of clickbait is often characterized by low-quality, misleading or false information, and this potentially engenders negative effects on users. Existing prompt learning methods based on pre-trained language models are reliant on external open knowledge bases to detect clickbait. These methods not only limit model performance due to the quality and availability of external knowledge bases, but also inevitably lead to delays in queries and responses. To address this issue, a soft prompt learning method with internal knowledge expansion for clickbait detection(SPCD_IE) is proposed in this paper. Expansion words are extracted from the training dataset, while hierarchical clustering and optimization strategies are employed to fine-tune the obtained expansion words in prompt learning, and the necessity of knowledge retrieval from external knowledge bases is avoided. Moreover, soft prompt learning is utilized to obtain the best prompts suitable for specific text types, preventing biases introduced by manual templates. Although SPCD_IE expands solely based on internal knowledge in few-shot scenarios, experimental results show it achieves better detection performance on three public clickbait datasets in less time.
		2024 Vol. 37 (9): 798-810 [Abstract] ( 406 ) [HTML 1KB] [ PDF 742KB] ( 659 )

	811	Parkinson's Disease Detection Model Based on Hierarchical Fusion of Multi-type Speech Information
		WU Di, JI Wei, ZHENG Huifen, LI Yun
		Speech data for Parkinson's disease detection typically includes sustained vowels, repeated syllables and contextual dialogues. Most of the existing models adopt a single type of speech data as input, making them susceptible to noise interference and a lack of robustness. The current challenge of Parkinson's disease detection is effectively integrating different types of speech data and extracting critical pathological information. In this paper, a Parkinson's disease detection method based on hierarchical fusion of multi-type speech information is proposed, aiming to extract rich and comprehensive pathological information and achieve better detection performance. Firstly, various acoustic features are extracted for different types of Parkinson's disease speech data. Then, a representation learning scheme is designed to mine deep information from multiple types of acoustic features. The underlying pathological information in acoustic features is reflected more accurately by extracting articulation and rhythm information. Furthermore, a decoupled representation learning space is designed for two mentioned types of information above to extract their respective private features, while learning their shared representation simultaneously. Finally, a cross-type attention hierarchical fusion module is designed to progressively fuse shared and private representations using cross-attention mechanisms at different granularities, aiming to enhance Parkinson's disease detection performance. Experiments on publicly available Italian Parkinson's disease speech dataset and a self-collected Chinese Parkinson's disease speech dataset demonstrate the accuracy improvement of the proposed approach.
		2024 Vol. 37 (9): 811-823 [Abstract] ( 379 ) [HTML 1KB] [ PDF 877KB] ( 588 )

	824	Secure and Efficient Federated Learning for Multi-domain Data Scenarios
		JIN Chunhua, LI Lulu, WANG Jiahao, JI Ling, LIU Xinying, CHEN Liqing, ZHANG Hao, WENG Jian
		To tackle the challenges of poor generalization, catastrophic forgetting and privacy attacks that federated learning faces in multi-domain data training, a scheme for secure and efficient federated learning for multi-domain scenarios(SEFL-MDS) is proposed. In the local training phase, knowledge distillation technology is employed to prevent catastrophic forgetting during multi-domain data training, while accelerating knowledge transfer across domains to improve training efficiency. In the uploading phase, Gaussian noise is added to locally updated gradients and generalization differences across domains using the Gaussian differential privacy mechanism to ensure secure data uploads and enhance the confidentiality of the training process. In the aggregation phase, a dynamic generalization-weighted algorithm is utilized to reduce generalization differences across domains, thereby enhancing the generalization capability. Theoretical analysis demonstrates the high robustness of the proposed scheme. Experiments on PACS and office-Home datasets show that the proposed scheme achieves higher accuracy with reduced training time.
		2024 Vol. 37 (9): 824-838 [Abstract] ( 449 ) [HTML 1KB] [ PDF 1532KB] ( 665 )

	839	Semantic Topological Maps-Based Reasoning for Vision-and-Language Navigation in Continuous Environments
		XIE Zilong, XU Ming
		To address the issue of inadequate reasoning ability of existing vision-language navigation methods in continuous environments, a method for semantic topological maps-based reasoning for vision-and-language navigation in continuous environments is proposed. First, regions and objects in the navigation environment are identified through scene understanding auxiliary tasks, and a knowledge base of spatial proximity is constructed. Second, the agent interacts with the environment in real time during the navigation process, collecting location information, encoding visual features and predicting semantic labels of regions and objects. Thereby a semantic topological map is gradually generated. On this basis, an auxiliary reasoning localization strategy is designed. A self-attention mechanism is employed to extract object and region information from navigation instructions, and the spatial proximity knowledge base is combined with semantic topological map to infer and localize objects and regions. The above assists navigation decisions and ensures that the agent navigation trajectory aligns with the instructions. Experimental results on public datasets R2R-CE and RxR-CE demonstrate the proposed method achieves a higher navigation success rate.
		2024 Vol. 37 (9): 839-849 [Abstract] ( 464 ) [HTML 1KB] [ PDF 2177KB] ( 660 )

模式识别与人工智能

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press

Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech 　Email:support@magtech.com.cn