多模态序列推荐场景下融合多维感知的自蒸馏多任务学习

doi:10.16451/j.cnki.issn1003-6059.202508001

Abstract
Figure/Table
References
Related Citation (7)

Download: PDF (838 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract As an important application scenario of recommendation systems, multimodal sequential recommendation is a research focus in both industry and academia. However, existing multi-task learning approaches for multimodal sequential recommendation fail to fully consider the high-order relationships within modalities and the enhanced effect of short-term sequences of users. Consequently, these approaches exhibit a low degree of personalization due to their weak semantic representations and interest modeling. To address this issue, an approach for self-distillation multi-task learning integrating multi-dimensional perception for multimodal sequential recommendation(SD-MTMP) is proposed. First, based on the extraction of topics from user reviews, high-order semantic correlations in user groups and item collections are modeled respectively by constructing user-topic and item-topic hypergraphs. The topic-aware representations of nodes are generated through hypergraph convolution. Simultaneously, a weighted bipartite graph is built based on the user-item rating matrix to generate rating-aware representations of nodes. Second, a cross-modal self-distillation auxiliary task is designed to achieve semantic alignment by transferring knowledge from topic-aware representations to rating-aware representations. Additionally, a dual-aware attention mechanism is established by comprehensively considering the effects of user ratings and time intervals on short-term sequences to accurately model short-term interests of users. On the basis of the above, a multi-task learning strategy is proposed for multimodal sequential recommendation. It jointly optimizes the recommendation loss and the self-distillation loss, thereby further enhancing the semantic expressiveness of representations and improving recommendation performance. Finally, experiments on three public datasets demonstrate the effectiveness of SD-MTMP.

Key words： Multi-task Learning Multimodal Sequential Recommendation Knowledge Transfer Dual-Aware Attention Hypergraph

Received: 16 July 2025

ZTFLH:

TP181

Fund:National Natural Science Foundation of China(No.62472270,62272285,72171137), Fundamental Research Program of Shanxi Province(No.202403021221021)

Corresponding Authors: PANG Jifang, Ph.D., associate professor. Her research interests include recommender system and intelligent decision.

About author:: TANG Zhe, Master student. Her research interests include recommender system.
XIE Yu, Ph.D., associate professor. His research interests include machine learning.
WANG Zhiqiang, Ph.D., associate professor. His research interests include machine learning, data mining and network big data analysis.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	TANG Zhe
	PANG Jifang
	XIE Yu
	WANG Zhiqiang

Cite this article:

TANG Zhe,PANG Jifang,XIE Yu等. Self-Distillation Multi-task Learning Integrating Multi-dimensional Perception for Multimodal Sequential Recommendation[J]. Pattern Recognition and Artificial Intelligence, 2025, 38(8): 669-683.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202508001 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2025/V38/I8/669

[1] ZHANG S X, LIU Z T, XU Y, et al. A Physics-Informed Hybrid Multitask Learning for Lithium-Ion Battery Full-Life Aging Estimation at Early Lifetime. IEEE Transactions on Industrial Informatics, 2025, 21(1): 415-424.
[2] JIANG S, ZHU G H, WANG Y, et al. Automatic Multi-task Lear-ning Framework with Neural Architecture Search in Recommendations // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 1290-1300.
[3] ZHANG X K, XU B, WU Y L, et al. FineRec: Exploring Fine-Grained Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1599-1608.
[4] ZHANG C, HAN Q L, CHEN R, et al. SSDRec: Self-Augmented Sequence Denoising for Sequential Recommendation // Proc of the IEEE 40th International Conference on Data Engineering. Washington, USA: IEEE, 2024: 803-815.
[5] ZHANG D, GENG Y L, GONG W W, et al. RecDCL: Dual Con-trastive Learning for Recommendation // Proc of the ACM Web Conference. New York, USA: ACM, 2024: 3655-3666.
[6] HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-Based Recommendations with Recurrent Neural Networks[C/OL].[2025-06-21]. https://arxiv.org/pdf/1511.06939.
[7] KANG W C, MCAULEY J. Self-Attentive Sequential Recommendation // Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE, 2018: 197-206.
[8] ZHANG M Q, WU S, YU X L, et al. Dynamic Graph Neural Networks for Sequential Recommendation. IEEE Transactions on Know-ledge and Data Engineering, 2023, 35(5): 4741-4753.
[9] WU S, TANG Y Y, ZHU Y Q, et al. Session-Based Recommendation with Graph Neural Networks. Proceedings of the AAAI Confe-rence on Artificial Intelligence, 2019, 33(1): 346-353.
[10] DING C X, ZHAO Z Y, LI C, et al. Session-Based Recommendation with Hypergraph Convolutional Networks and Sequential Information Embeddings. Expert Systems with Applications, 2023. DOI: 10.1016/j.eswa.2023.119875.
[11] FU C, WANG K, WU J H, et al. Residual Multi-task Learner for Applied Ranking // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 4974-4985.
[12] NI Y B, OU D, LIU S C, et al. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-Co-mmerce Tasks // Proc of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 596-605.
[13] ZHAO J J, DU B W, SUN L L, et al. Multiple Relational Attention Network for Multi-task Learning // Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2019: 1123-1131.
[14] MA X, ZHAO L Q, HUANG G, et al. Entire Space Multi-task Model: An Effective Approach for Estimating Post-Click Conversion Rate // Proc of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2018: 1137-1140.
[15] WEN H, ZHANG J, WANG Y, et al. Entire Space Multi-task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2020: 2377-2386.
[16] 周俊,胡斌斌,张志强,等. MoGE:基于图上下文增强的多任务推荐算法. 电子学报, 2023, 51(11): 3377-3387.
(ZHOU J, HU B B, ZHANG Z Q, et al. MoGE: Graph Context Enhanced Multi-task Recommendation Method. Acta Electronica Sinica, 2023, 51(11): 3377-3387.)
[17] HE Y, FENG X, CHENG C, et al. MetaBalance: Improving Multi-task Recommendations via Adapting Gradient Magnitudes of Auxi-liary Tasks // Proc of the ACM Web Conference. New York, USA: ACM, 2022: 2205-2215.
[18] LIU Y X, XIA L H, HUANG C, et al. SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1609-1618.
[19] XIE X, SUN F, LIU Z Y, et al. Contrastive Learning for Sequential Recommendation // Proc of the IEEE 38th International Conference on Data Engineering. Washington, USA: IEEE, 2022: 1259-1273.
[20] WU J N, WANG X, FENG F L, et al. Self-Supervised Graph Lear-ning for Recommendation // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2021: 726-735.
[21] LIU J X, CHEN S C. TimesURL: Self-Supervised Contrastive Lear-ning for Universal Time Series Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(12): 13918-13926.
[22] FU J C, GE X R, XIN X, et al. IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled Peft // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 687-697.
[23] 张晓明,梁正光,姚昌瑀,等. 融合潜在结构与语义信息的多模态推荐方法. 模式识别与人工智能, 2024, 37(3): 231-241.
(ZHANG X M, LIANG Z G, YAO C Y, et al. Multimodal Re-commendation Method Integrating Latent Structures and Semantic Information. Pattern Recognition and Artificial Intelligence, 2024, 37(3): 231-241.)
[24] CHEN G D, SUN R N, JIANG Y Z H, et al. A Multi-modal Mo-deling Framework for Cold-Start Short-Video Recommendation // Proc of the 18th ACM Conference on Recommender Systems. New York, USA: ACM, 2024: 391-400.
[25] GUO Z Q, LI J J, LI G H, et al. LGMRec: Local and Global Graph Learning for Multimodal Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(8): 8454-8462.
[26] LU J S, BATRA D, PARIKH D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 13-23.
[27] YU P H, TAN Z Y, LU G M, et al. Multi-view Graph Convolution Network for Multimedia Recommendation // Proc of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023: 6576-6585.
[28] HU H C, GUO W, LIU Y, et al. Adaptive Multi-modalities Fusion in Sequential Recommendation Systems // Proc of the 32nd ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2023: 843-853.
[29] SUN Z Y, FANG Y, WU T, et al. Alpha-CLIP: A CLIP Model Focusing on Wherever You Want // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 13019-13029.
[30] 张凯涵,冯晨娇,姚凯旋,等. 基于对比学习和语义增强的多模态推荐算法. 模式识别与人工智能, 2024, 37(6): 479-490.
(ZHANG K H, FENG C J, YAO K X, et al. Multimodal Reco-mmendation Algorithm Based on Contrastive Learning and Semantic Enhancement. Pattern Recognition and Artificial Intelligence, 2024, 37(6): 479-490.)
[31] SHEN Z Q, LIU Z C, QIN J, et al. S²-BNN: Bridging the Gap between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2165-2174.
[32] LIU Q D, WU X, WANG Y J, et al. LLM-ESR: Large Language Models Enhancement for Long-Tailed Sequential Recommendation // Proc of the 38th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 26701-26727.
[33] SHUAI J, WU L, ZHANG K, et al. Topic-Enhanced Graph Neural Networks for Extraction-Based Explainable Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 1188-1197.
[34] YANG W, HUO T F, LIU Z Q, et al. Review-Based Multi-intention Contrastive Learning for Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 2339-2343.
[35] MCINNES L, HEALY J, MELVILLE J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction[C/OL]. [2025-06-21].https://arxiv.org/abs/1802.03426.
[36] MCINNES L, HEALY J, ASTELS S. HDBSCAN: Hierarchical Density Based Clustering. The Journal of Open Source Software, 2017, 2(11). DOI: 10.21105/joss.00205.
[37] WANG X, HE X N, WANG M, et al. Neural Graph Collaborative Filtering // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 165-174.
[38] CHEN L H, YANG N, YU P S, et al. Time Lag Aware Sequential Recommendation // Proc of the 31st ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2022: 212-221.
[39] GARG D, GUPTA P, MALHOTRA P, et al. Sequence and Time Aware Neighborhood for Session-Based Recommendations: STAN // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 1069-1072.
[40] ZHOU X, SUN A X, LIU Y, et al. SelfCF: A Simple Framework for Self-Supervised Collaborative Filtering. ACM Transactions on Recommender Systems, 2023, 1(2): 1-25.
[41] YU J L, XIA X, CHEN T, et al. XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(2): 913-926.
[42] WANG J P, ZENG Z Y, WANG Y X, et al. MISSRec: Pre-trai-ning and Transferring Multi-modal Interest-Aware Sequence Representation for Recommendation // Proc of the 31st ACM Internatio-nal Conference on Multimedia. New York, USA: ACM, 2023: 6548-6557.
[43] SHUAI J, ZHANG K, WU L, et al. A Review-Aware Graph Con-trastive Learning Framework for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Deve-lopment in Information Retrieval. New York, USA: ACM, 2022: 1283-1293.
[44] XIONG Y Q, LIU Y Z, QIAN Y, et al. Review-Based Recommendation under Preference Uncertainty: An Asymmetric Deep Lear-ning Framework. European Journal of Operational Research, 2024, 316(3): 1044-1057.