Pattern Recognition and Artificial Intelligence

模式识别与人工智能

Home About Journal Editorial Board Instructions Ethics Statement Contact Us 中文

	Judgement and Disposal of Academic Misconduct Article

	Copyright Transfer Agreement

	Proof of Confidentiality

	Requirements for Electronic Version

More....

	Chinese Association of Automation

	National ResearchCenter for Intelligent Computing System

	Institute of Intelligent Machines,Chinese Academy of Sciences

More....

	2025 Vol.38 Issue.8, Published 2025-08-25

	Papers and Reports Researches and Applications

Papers and Reports

	669	Self-Distillation Multi-task Learning Integrating Multi-dimensional Perception for Multimodal Sequential Recommendation
		TANG Zhe, PANG Jifang, XIE Yu, WANG Zhiqiang
		As an important application scenario of recommendation systems, multimodal sequential recommendation is a research focus in both industry and academia. However, existing multi-task learning approaches for multimodal sequential recommendation fail to fully consider the high-order relationships within modalities and the enhanced effect of short-term sequences of users. Consequently, these approaches exhibit a low degree of personalization due to their weak semantic representations and interest modeling. To address this issue, an approach for self-distillation multi-task learning integrating multi-dimensional perception for multimodal sequential recommendation(SD-MTMP) is proposed. First, based on the extraction of topics from user reviews, high-order semantic correlations in user groups and item collections are modeled respectively by constructing user-topic and item-topic hypergraphs. The topic-aware representations of nodes are generated through hypergraph convolution. Simultaneously, a weighted bipartite graph is built based on the user-item rating matrix to generate rating-aware representations of nodes. Second, a cross-modal self-distillation auxiliary task is designed to achieve semantic alignment by transferring knowledge from topic-aware representations to rating-aware representations. Additionally, a dual-aware attention mechanism is established by comprehensively considering the effects of user ratings and time intervals on short-term sequences to accurately model short-term interests of users. On the basis of the above, a multi-task learning strategy is proposed for multimodal sequential recommendation. It jointly optimizes the recommendation loss and the self-distillation loss, thereby further enhancing the semantic expressiveness of representations and improving recommendation performance. Finally, experiments on three public datasets demonstrate the effectiveness of SD-MTMP.
		2025 Vol. 38 (8): 669-683 [Abstract] ( 309 ) [HTML 1KB] [ PDF 838KB] ( 285 )

	684	Swin Transformer-Based Skin Disease Segmentation Network via Dynamic Agent Bottleneck and Multi-scale Dilated Attention
		SUN Lin, XUE Hongke, LÜ Juan
		Accurate segmentation of skin lesion areas is critical for the diagnosis and treatment of dermatological diseases. To address the challenges posed by diverse lesion morphologies, high similarity between lesions and surrounding tissues, and blurred boundaries in existing networks, a Swin Transformer-based skin disease segmentation network via dynamic agent bottleneck and multi-scale dilated attention(STNDA) is proposed. First, a Swin Transformer-based backbone network is constructed to overcome the limitations of traditional convolutions in capturing global context. By leveraging the hierarchical architecture of the network, multi-scale feature fusion is achieved, and long-range dependencies are established to enhance the network ability to extract semantic features from skin lesions with varying morphologies. Second, to improve the feature expression ability of STNDA, a dynamic agent bottleneck module is designed. The module adaptively generates agent vectors and positional biases based on input features, allowing the network to dynamically adjust its focus on local receptive fields. Thus, the segmentation errors caused by the interference from highly similar skin tissues are further mitigated. Finally, a multi-scale dilated attention fusion module is proposed to enhance edge perception ability of the network. A multi-branch parallel architecture with multi-scale dilated convolutions is designed by integrating with spatial-channel attention mechanisms to improve the network sensitivity to lesion boundaries. Experiments on ISIC2017, PH2 and ISIC2018 datasets demonstrate STNDA achieves superior performance, thereby confirming its effectiveness.
		2025 Vol. 38 (8): 684-698 [Abstract] ( 203 ) [HTML 1KB] [ PDF 2325KB] ( 200 )

	699	Certified Pseudo-Label Enhanced Active Learning Framework for Pattern Interest Evaluation
		WANG Tian, WANG Lu, XIE Wenbo, WANG Xin
		Frequent pattern mining(FPM) is one of the key tasks of graph data mining. The objective of FPM is to extract patterns with support values higher than predefined thresholds from large-scale graph data. However, constrained by single-dimensional evaluation metrics and neglect of subjective preferences, traditional FPM methods often fail to align mining results with the expectations of users. To address this issue, a certified pseudo-label enhanced active learning framework for pattern interest evaluation(CPALF) is proposed. CPALF is designed to accurately predict subjective pattern preferences of users through minimal human interaction. An active learning strategy is employed to efficiently collect the preferences of users via human-computer interaction. CPALF incorporates semi-supervised learning to generate high-confidence pseudo-labeled training samples from unlabeled data, thereby significantly improving prediction performance while reducing annotation dependency. Experiments demonstrate that CPALF effectively captures the preferences of users with high prediction accuracy under limited labeled data.
		2025 Vol. 38 (8): 699-713 [Abstract] ( 217 ) [HTML 1KB] [ PDF 940KB] ( 222 )

Researches and Applications

	714	Risk Identification in Driving Scenarios via Fusion of Scene Graph Embeddings and Optical Flow Features
		XIAO Yao, YANG Yijian, GOU Chao
		The spatiotemporal and behavioral interactions of multimodal traffic participants are complex and difficult to recognize accurately. Therefore, the difficulty of driving risk identification is increased. To address this issue, a virtual traffic scene graph dataset, CARLA_242, is constructed for collision risk assessment. The dataset contains seven types of traffic participants and sixteen types of scene graph relations. A risk identification method via fusion of scene graph embeddings and optical flow features is proposed. The method consists of three core modules. In the spatial modeling module, node features and relation information are first jointly encoded by a multi-relational graph convolutional network and then exploited to obtain scene graph embeddings through graph pooling and readout operations. In the optical flow extraction module, optical flow is estimated from video sequences, and optical flow features representing dynamic motion are extracted. In the spatiotemporal modeling module, the fused representations of scene graph embeddings and optical flow features are processed by a temporal transformer encoder for temporal modeling to achieve driving risk identification. Experiments demonstrate the superior performance of the proposed method on three scene graph datasets. The results validate the effectiveness of multimodal fusion of scene graph and optical flow features for driving risk identification.
		2025 Vol. 38 (8): 714-726 [Abstract] ( 186 ) [HTML 1KB] [ PDF 1376KB] ( 186 )

	727	Adversarial Attack Algorithm for Object Detection Based on Local-Attribute Generative Adversarial Networks
		XU Jianuo, SHAO Wei, ZHANG Daoqiang
		The practical effectiveness of existing adversarial attack methods for object detection in medicine is limited by the challenge of achieving high attack success rates and strong stealthiness of adversarial examples. To address this issue, adversarial attack algorithm for object detection based on local-attribute generative adversarial networks is proposed in this paper. It is intended to optimize the quality of adversarial examples and improve attack performance. First, an image is partitioned into patches to construct its graph structure, and a local attribute discrepancy loss derived from the graph is proposed to enhance the visual stealthiness of adversarial examples. Second, a target mislocalization loss is introduced to mislead the detector into producing inaccurate object localizations, thereby amplifying the adversarial impact.Finally, these two loss functions are integrated, and the generative adversarial network is updated through backpropagation. Experiments on two publicly available blood cell datasets, BCCD and LISC, demonstrate that the adversarial examples generated by the proposed method against the Faster R-CNN model outperform those by the existing algorithms in terms of attack success rate and stealthiness. Moreover, the generated adversarial examples exhibit strong attack transferability.
		2025 Vol. 38 (8): 727-739 [Abstract] ( 172 ) [HTML 1KB] [ PDF 3035KB] ( 213 )

	740	Adaptive Granular-Ball and Pure Cluster Splitting for Open Intent Classification
		WANG Jingkai, LI Yanhua, LIU Jiafen, WANG Xiangkun, YANG Xin
		Open intent classification is a critical task in building intelligent dialogue systems, and it is intended to detect unknown intents accurately while recognizing the known ones. However, existing methods are limited in modeling complex semantic structures and fail to represent the diversity within intent classes, resulting in inter-class confusion. To address this issue, a method for adaptive granular-ball and pure cluster splitting for open intent classification(AGPCS-OIC)is proposed. First, adaptive granular-ball clustering is applied to construct multi-center subclass structures reflecting the true data distribution, and thus intra-class heterogeneity is captured more effectively. Then, a structural sparsity-based pure cluster splitting strategy is introduced to further divide loosely bounded but high-purity granular-balls. The expressiveness of decision boundaries and the ability to reject the unknown intents are enhanced. Additionally, a granular-ball-aware contrastive learning mechanism is incorporated. Structural-level semantic pairs are built with granular-ball centers serving as anchors, to guide the model to improve intra-class compactness and inter-class separability in the feature space. Experiments show that AGPCS-OIC achieves strong performance on multiple open intent classification datasets.
		2025 Vol. 38 (8): 740-751 [Abstract] ( 153 ) [HTML 1KB] [ PDF 1272KB] ( 148 )

	752	Risk Prediction of Polycystic Ovary Syndrome Using Stamen-Pistil Convolutional Network
		LUO Lie, WANG Tao, CHEN Bingjing, WU Xiaoyuan, LIN Zhongyan
		In polycystic ovary syndrome(PCOS) clinical data analysis, many models often suffer from underfitting, while convolutional neural networks are limited by their receptive fields. To address these issues, a stamen-pistil convolutional network(SPCNet) is proposed. The design of SPCNet is inspired by the collaborative structure of stamens and pistils in flowers. The pistil convolution passes vertically through the main information stream to extract global features, while the stamen convolution performs supplementary sampling from the horizontal neighborhood. The structure of floral reproductive organs is adopted in the design of convolutional operators to improve feature extraction. By expanding the sampling space in both longitudinal and transverse directions, SPCNet effectively compensates for the limitations of existing PCOS processing approaches. Experiments on two public PCOS datasets show that SPCNet improves computational efficiency and achieves higher prediction accuracy with lightweight structure, thereby better meeting the needs for instant analysis of complex PCOS clinical data.
		2025 Vol. 38 (8): 752-763 [Abstract] ( 169 ) [HTML 1KB] [ PDF 1805KB] ( 163 )

模式识别与人工智能

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press

Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech 　Email:support@magtech.com.cn