Abstract:Frequent pattern mining(FPM) is one of the key tasks of graph data mining. The objective of FPM is to extract patterns with support values higher than predefined thresholds from large-scale graph data. However, constrained by single-dimensional evaluation metrics and neglect of subjective preferences, traditional FPM methods often fail to align mining results with the expectations of users. To address this issue, a certified pseudo-label enhanced active learning framework for pattern interest evaluation(CPALF) is proposed. CPALF is designed to accurately predict subjective pattern preferences of users through minimal human interaction. An active learning strategy is employed to efficiently collect the preferences of users via human-computer interaction. CPALF incorporates semi-supervised learning to generate high-confidence pseudo-labeled training samples from unlabeled data, thereby significantly improving prediction performance while reducing annotation dependency. Experiments demonstrate that CPALF effectively captures the preferences of users with high prediction accuracy under limited labeled data.
[1] ELSEIDY M, ABDELHAMID E, SKIADOPOULOS S, et al. GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proceedings of the VLDB Endowment, 2014, 7(7): 517-528. [2] WANG X, LAN Z, HE Y A, et al. A Cost-Effective Approach for Mining Near-Optimal Top-k Patterns. Expert Systems with Applications, 2022, 202. DOI: 10.1016/j.eswa.2022.117262. [3] SOHN K, BERTHELOT D, LI C L, et al. FixMatch: Simplifying Semi-supervised Learning with Consistency and Confidence // Proc of the 34th International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2020: 596-608. [4] ZHANG B W, WANG Y D, HOU W X, et al. FlexMatch: Boosting Semi-supervised Learning with Curriculum Pseudo Labeling // Proc of the 35th International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2021: 18408-18419. [5] XIE W B, LIU Z, DAS D, et al. Scalable Clustering by Aggregating Representatives in Hierarchical Groups. Pattern Recognition, 2023: 136. DOI: 10.1016/j.patcog.2022.109230. [6] WU Y X, MENG Y F, LI Y, et al. COPP-Miner: Top-k Contrast Order-Preserving Pattern Mining for Time Series Classification. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(6): 2372-2387. [7] WANG X, XIANG M Y, ZHAN H Y, et al. Distributed Top-k Pa-ttern Mining // Proc of the Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data. Berlin, Germany: Springer, 2021: 203-220. [8] 邹杰军,王欣,石俊豪,等. 面向大图的Top-Rank-K频繁模式挖掘算法. 南京大学学报(自然科学), 2024, 60(1): 38-52. (ZOU J J, WANG X, SHI J H, et al. Top-Rank-K Frequent Pa-ttern Mining Algorithm for Large Graphs. Journal of Nanjing University(Natural Science), 2024, 60(1): 38-52) [9] LEE C, KIM H, CHO M, et al. Incremental Top-k High Utility Pattern Mining and Analyzing over the Entire Accumulated Dynamic Database. IEEE Access, 2024, 12: 77605-77620. [10] LEWIS D D, CATLETT J. Heterogeneous Uncertainty Sampling for Supervised Learning // Proc of the 11th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1994: 148-156. [11] GAL Y R, ISLAM R, GHAHRAMANI Z. Deep Bayesian Active Lear-ning with Image Data // Proc of the 34th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2017: 1183-1192. [12] NGUYEN V L, SHAKER M H, HÜLLERMEIER E. How to Mea-sure Uncertainty in Uncertainty Sampling for Active Learning. Machine Learning, 2022, 111(1): 89-122. [13] RAJ A, BACH F. Convergence of Uncertainty Sampling for Active Lear-ning // Proc of the 39th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 2022: 18310-18331. [14] SENER O, SAVARESE S. Active Learning for Convolutional Neural Networks: A Core-Set Approach[C/OL]. [2025-06-23]. https://arxiv.org/pdf/1708.00489. [15] WANG M, YANG C Y, ZHAO F, et al. Cost-Sensitive Active Learning for Incomplete Data. IEEE Transactions on Systems, Man, and Cybernetics(Systems), 2022, 53(1): 405-416. [16] ZHOU P, ZHANG T X, ZHAO L W, et al. Pre-clustering Active Learning Method for Automatic Classification of Building Structures in Urban Areas. Engineering Applications of Artificial Intelligence, 2023, 123(C). DOI: 10.1016/j.engappai.2023.106382. [17] 谢文波,邓涛,付勋,等. 基于改进最近邻图的主动聚类方法. 模式识别与人工智能, 2025, 38(4): 341-358. (XIE W B, DENG T, FU X, et al. Active Clustering with Tai-lored Nearest Neighbor Graph. Pattern Recognition and Artificial Intelligence, 2025, 38(4): 341-358.) [18] WU J X, CHEN J X, HUANG D. Entropy-Based Active Learning for Object Detection with Progressive Diversity Constraint // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 9387-9396. [19] DOUCET P, ESTERMANN B, ACZEL T, et al. Bridging Diversity and Uncertainty in Active Learning with Self-Supervised Pre-training[C/OL].[2025-06-23]. https://arxiv.org/pdf/2403.03728. [20] WANG J Y, ZHAO N. Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2025: 20329-20339. [21] XIE Q Z, DAI Z H, HOVY E, et al. Unsupervised Data Augmentation for Consistency Training // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 6256-6268. [22] RIZVE M N, DUARTE K, RAWAT Y S, et al. In Defense of Pseu-do-Labeling: An Uncertainty-Aware Pseudo-Label Selection Framework for Semi-supervised Learning[C/OL].[2025-06-23]. https://arxiv.org/pdf/2101.06329. [23] SUN K, LIN Z C, ZHU Z X. Multi-stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labeled Nodes. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 5892-5899. [24] WEN Z T, PIZARRO O, WILLIAMS S. Active Self-Semi-supervised Learning for Few Labeled Samples. Neurocomputing, 2025, 614. DOI: 10.1016/j.neucom.2024.128772. [25] TARVAINEN A, VALPOLA H. Mean Teachers Are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-supervised Deep Learning Results // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 1195-1204. [26] ARAZO E, ORTEGO D, ALBERT P, et al. Pseudo-Labeling and Confirmation Bias in Deep Semi-supervised Learning[C/OL].[2025-06-23]. https://arxiv.org/pdf/1908.02983. [27] MÜLLER R, KORNBLITH S, HINTON G. When Does Label Smoothing Help? // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 4694-4703. [28] CHEN Q L, LIU P, NI J, et al. Pseudo-Labeling for Small Lesion Detection on Diabetic Retinopathy Images[C/OL].[2025-06-23]. https://arxiv.org/pdf/2003.12040. [29] VERMA V, QUI L, KAWAGUCHI K, et al. GraphMix: Improved Training of GNNs for Semi-supervised Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(11): 10024-10032. [30] WANG X, SHI J H, ZOU J J, et al. Supports Estimation via Graph Sampling. Expert Systems with Applications, 2024, 240. DOI: 10.1016/j.eswa.2023.122554. [31] YAN X F, HAN J W. gSpan: Graph-Based Substructure Pattern Mining // Proc of the IEEE International Conference on Data Mi-ning. Washington, USA: IEEE, 2002: 721-724. [32] HAN Q, TIAN Z B, XIA C W, et al. InfoMatch: Entropy Neural Estimation for Semi-supervised Image Classification // Proc of the 33rd International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2024: 4089-4097. [33] YE S J, WANG Z, XIONG P B, et al. Multi-stage Few-Shot Micro-Defect Detection of Patterned OLED Panel Using Defect Inpainting and Multi-scale Siamese Neural Network. Journal of Inte-lligent Manufacturing, 2024, 35(6): 2653-2669. [34] LECUN Y, BENGIO Y, HINTON G. Deep Learning. Nature, 2015, 521(7553): 436-444. [35] BERTHELOT D, CARLINI N, GOODFELLOW I, et al. MixMa-tch: A Holistic Approach to Semi-supervised Learning // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 5049-5059. [36] ROZEMBERCZKI B, SARKAR R. Twitch Gamers: A Dataset for Evaluating Proximity Preserving and Structural Role-Based Node Embeddings[C/OL]. [2025-06-23]. https://arxiv.org/pdf/2101.03091. [37] YANG J, LESKOVEC J. Defining and Evaluating Network Communities Based on Ground-Truth // Proc of the IEEE 12th International Conference on Data Mining. Washington, USA: IEEE, 2012: 745-754. [38] WANG Z F, BERMAN M, RANNEN-TRIKI A, et al. Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-Grained Intersection over Union // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 60144-60225. [39] SHI Y F, YU Z W, CAO W M, et al. Fast and Effective Active Clustering Ensemble Based on Density Peak. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(8): 3593-3607. [40] 王璐,付勋,沈玲珍,等. 基于主动学习的模式兴趣评估方法. 南京大学学报(自然科学), 2025, 61(2): 249-260. (WANG L, FU X, SHEN L Z, et al. Pattern Interestingness Eva-luation Based on Active Learning. Journal of Nanjing University(Natural Sciences), 2025, 61(2): 249-260.) [41] YI J S K, SEO M, PARK J, et al. PT4AL: Using Self-Supervised Pretext Tasks for Active Learning // Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2022: 596-612.