模式识别与人工智能
Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence
22 Judgement and Disposal of Academic Misconduct Article
22 Copyright Transfer Agreement
22 Proof of Confidentiality
22 Requirements for Electronic Version
More....
22 Chinese Association of Automation
22 National ResearchCenter for Intelligent Computing System
22 Institute of Intelligent Machines,Chinese Academy of Sciences
More....
 
 
2025 Vol.38 Issue.12, Published 2025-12-25

Papers and Reports    Researches and Applications   
   
Papers and Reports
1057 Deep Contrastive Multi-view Clustering with Transformer Fusion
LI Shunyong, YUAN Zhiying, ZHAO Xingwang
As an important task of unsupervised learning, multi-view clustering is designed to fuse heterogeneous view information to mine a consistent clustering structure. In the existing methods, the low-level features extracted by autoencoders lack cross-view semantic consistency, and simple fusion strategies lack dynamic assessment of view quality. Additionally, there is an absence of multi-level contrast constraints and local-global label alignment mechanisms. To address these issues, a deep contrastive multi-view clustering algorithm with Transformer fusion(DCMCTF) is proposed. First, cross-view alignment of low-level feature distributions is achieved under an alternating adversarial learning mechanism, and then instance-level and cluster-level dual contrastive learning mechanisms are introduced to enhance cross-view consistency and feature discriminative ability. Second, a Transformer adaptive fusion module is leveraged to dynamically learn view relationships. Robust consensus representations are generated by combining quality-aware scoring, and the global labels obtained from consensus representations are aligned with local labels of specific views. Experiments on 9 datasets demonstrate that DCMCTF achieves excellent clustering performance.
2025 Vol. 38 (12): 1057-1074 [Abstract] ( 22 ) [HTML 1KB] [ PDF 4007KB] ( 25 )
1075 Set-Valued Fuzzy Granular Ball Rough Set Model and Attribute Reduction Algorithm for Set-Valued Decision Systems
LUO Zhongtuan, TAN Anhui, GU Shenming, WU Weizhi
As an effective multi-granularity data processing paradigm, granular ball computing(GBC) exhibits significant potential in the fields of complex data analysis and knowledge reduction. However, existing GBC models fail to fully characterize the fuzziness and uncertainty inherent in set-valued attributes, restricting their application in set-valued decision systems. To address this limitation, a set-valued fuzzy granular ball rough set model is proposed and a corresponding attribute reduction algorithm is designed. Firstly, an adaptive granular ball generation algorithm is proposed to enable the dynamic granulation of the data space based on the characteristics of set-valued data. Secondly, fuzzy granular ball tolerance relations and approximation operators are introduced, and their mathematical properties are systematically analyzed and proven. Furthermore, a forward greedy attribute reduction algorithm is constructed based on dependency degree to achieve efficient attribute reduction. Finally, experimental results demonstrate that the proposed algorithm not only obtains more compact attribute reducts but also achieves higher average classification accuracy with CART and SVM classifiers. These results validate its effectiveness and superiority in set-valued decision systems.
2025 Vol. 38 (12): 1075-1090 [Abstract] ( 17 ) [HTML 1KB] [ PDF 799KB] ( 17 )
1091 Multi-view Contrastive Learning for Hypergraph Alignment
NIU Changyu, ZHANG Haifeng, ZHANG Xiaoming
Network alignment is intended to mine node correspondences between different networks, and it is crucial for integrating information across diverse domains. However, most existing methods focus on ordinary graphs and overlook the prevalent high-order group interactions in real-world systems. There are only a few hypergraph alignment methods relying solely on shallow local topological information and failing to capture the deep structural semantics of nodes. To address this issue, a multi-view contrastive learning method for hypergraph alignment is proposed, named hypergraph-clique expansion contrastive learning (HCCL). A multi-view contrastive learning framework is constructed. Node structural features are collaboratively captured from both higher-order and lower-order complementary perspectives to more effectively leverage the rich structural information of hypergraphs for alignment tasks. The original hypergraph and its clique expansion graph are utilized as dual views simultaneously. The original hypergraph learns higher-order group relationships through hypergraph neural networks, while its clique expansion graph captures lower-order pairwise relationships via graph convolutional networks. On this basis, a cross-view contrastive learning mechanism is introduced to extract more robust and intrinsic node features spanning different structural scales by maximizing the consistency of node embeddings across both views. Extensive experiments on real-world datasets fully validate the effectiveness and robustness of HCCL.
2025 Vol. 38 (12): 1091-1107 [Abstract] ( 17 ) [HTML 1KB] [ PDF 1109KB] ( 20 )
Researches and Applications
1108 Crowd Counting Based on Regional Context Awareness
HONG Zhiyuan, GAO Xinjian, REN Mengyan, WANG Xilin, GAO Jun
To address the challenges of uneven density distribution and large scale variations in complex crowd scenes, existing Transformer-based methods typically overlook the utilization of spatial and channel information while handling cross-scale contextual features. Therefore, a method for crowd counting based on regional context awareness(RCA) is proposed. First, a region guidance module is designed to adaptively assign an attention region for each feature location. Thereby region-level context is introduced and non-uniform density distributions are better accommodated. Second, a spatial-channel context awareness module is designed to enable feature interaction across spatial and channel dimensions. Consequently, cross-dimensional regional dependencies are constructed and the discrimination between foreground and background regions is enhanced. Finally, a distribution-level constraint is introduced during the training to improve the consistency between the predicted density distribution and the ground-truth distribution. Experimental results on JHU-Crowd++, ShanghaiTech A, and ShanghaiTech B datasets validate the robustness and generalization capability of RCA in complex scenes.
2025 Vol. 38 (12): 1108-1120 [Abstract] ( 19 ) [HTML 1KB] [ PDF 6158KB] ( 18 )
1121 Multimodal Recommendation with User Semantic Embedding Refinement
XU Hao, XIA Hongbin, WANG Xiaofeng
Existing multimodal recommendation methods typically extract features of the different modalities separately, such as images and texts, and only shallow fusion is performed during training. Therefore, it is difficult to fully explore cross-modal semantics. Moreover, mainstream methods mostly adopt randomly initialized user representations, resulting in insufficient discriminability among users. To address these issues, a multimodal recommendation method with user semantic embedding refinement(USERec) is proposed in this paper. The problems are alleviated from the perspectives of both the item and the user. On the item side, a multimodal large language model is utilized to achieve deep semantic fusion by guiding visual feature extraction with textual information. Thus, more suitable item representations for recommendation tasks can be obtained. On the user side, positional encoding is introduced into user representations to enhance the spectral diversity of the user index space. Personalized local graphs are then constructed through degree-sensitive pruning, and the global awareness of users is augmented via a randomly sampled attention mechanism, thereby improving the discriminability of user representations. Experiments on four real-world datasets verify the effectiveness of USERec.
2025 Vol. 38 (12): 1121-1134 [Abstract] ( 19 ) [HTML 1KB] [ PDF 1050KB] ( 21 )
1135 UAV Target Detection Method Based on Multimodal Image Feature Fusion
XUE Wenhui, CHEN Zhongcheng, CHEN Jun, WANG Yong
Multimodal images exhibit significant complementarity at the perception level. Infrared images provide stable target responses under low-light conditions and complex backgrounds, while visible images offer rich texture and detailed information. The fusion of the above images effectively enhances the robustness and accuracy of unmanned aerial vehicle(UAV) object detection in complex environments. Therefore, a UAV target detection method based on multimodal image feature fusion(MIFF-UAVDet) is proposed. YOLOv7-tiny is employed as the backbone and dual branches are constructed for infrared and visible modalities. Features are extracted separately from each modality to provide complementary representations for subsequent feature fusion. Furthermore, a lightweight multi-scale spatial attention fusion module is introduced to guide adaptive spatial-level cross-modal fusion and strengthen feature representation by integrating channel compression module, multi-scale depthwise separable convolutions and multi-scale spatial attention mechanism. Meanwhile, due to the scale compression and shape distortion of targets under UAV aerial perspectives, the aspect ratio penalty term in complete intersection over union(CIoU) is prone to ineffectiveness during practical regression, and the localization accuracy is reduced. To address these issues, an improved height-width constrained loss function based on CIoU(HWCIoU) is proposed. Experimental results show that MIFF-UAVDet outperforms the state-of-the-art methods in terms of detection accuracy, localization precision and inference speed, and MIFF-UAVDet exhibits stronger robustness in scenarios with complex backgrounds, varying illumination, and significant target scale variations.
2025 Vol. 38 (12): 1135-1148 [Abstract] ( 21 ) [HTML 1KB] [ PDF 4179KB] ( 23 )
模式识别与人工智能
 

Supervised by
China Association for Science and Technology
Sponsored by
Chinese Association of Automation
NationalResearchCenter for Intelligent Computing System
Institute of Intelligent Machines, Chinese Academy of Sciences
Published by
Science Press
 
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn