Multimodal Recommendation Algorithm Based on Contrastive Learning and Semantic Enhancement
ZHANG Kaihan1, FENG Chenjiao2, YAO Kaixuan3, SONG Peng4, LIANG Jiye3
1. School of Computer Science and Technology, North University of China, Taiyuan 030051; 2. School of Applied Mathematics, Shanxi University of Finance and Economics, Taiyuan 030006; 3. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006; 4. School of Economics and Management, Shanxi University, Taiyuan 030031
Abstract:The multimodal data of items is typically introduced into recommendation algorithms as additional auxiliary information to enrich the representation features of users and items. How to effectively integrate the interaction information with multimodal information of users and items is a key issue to the research. Existing methods are still insufficient in feature fusion and semantic association modeling. Therefore, a multimodal recommendation algorithm based on contrastive learning and semantic enhancement is proposed from the perspective of feature fusion. Firstly, the graph neural network and attention mechanism are adopted to fully integrate collaborative features and multimodal features. Next, the semantic association structures within each modality are learned under the guidance of the interaction structure in collaborative information. Meanwhile, the contrastive learning paradigm is employed to capture cross-modal representation dependencies. A reliability factor is introduced into the contrastive loss to adaptively adjust the constraint strength of the multimodal features, consequently suppressing the influence of data noise. Finally, the aforementioned tasks are jointly optimized to generate recommendation results. Experimental results on four real datasets show that the proposed algorithm yields excellent performance.
[1] ZHANG S Y, FENG F L, KUANG K, et al. Personalized Latent Structure Learning for Recommendation. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2023, 45(8): 10285-10299. [2] NGIAM J Q, KHOSLA A, KIM M, et al. Multimodal Deep Lear-ning // Proc of the 28th International Conference on Machine Lear-ning. Washington, USA: Omnipress, 2011: 689-696. [3] BALTRUŠAITIS T, AHUJA C, MORENCY L P. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423-443. [4] LI S Y, GUO D, LIU K, et al. Multimodal Counterfactual Learning Network for Multimedia-Based Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023:1539-1548. [5] LIU H, WEI Y W, LIU F, et al. Dynamic Multimodal Fusion via Meta-Learning Towards Micro-video Recommendation. ACM Transactions on Information Systems, 2024, 42(2). DOI: 10.1145/3617827. [6] HE R N, MCAULEY J J. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback // Proc of the 30th AAAI Confe-rence on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2016: 144-150. [7] ZHOU Y, GUO J, SUN H, et al. Attention-Guided Multi-step Fusion: A Hierarchical Fusion Network for Multimodal Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 1816-1820. [8] LIU F, CHEN H L, CHENG Z Y, et al. Disentangled Multimodal Representation Learning for Recommendation. IEEE Transactions on Multimedia, 2023, 25: 7149-7159. [9] WAGN Q F, WEI Y W, YIN J H, et al. DualGNN: Dual Graph Neural Network for Multimedia Recommendation. IEEE Transactions on Multimedia, 2023, 25: 1074-1084. [10] 黄震华, 林小龙, 孙圣力,等. 会话场景下基于特征增强的图神经推荐方法. 计算机学报, 2022, 45(4): 766-780. (HUANG Z H, LIN X L, SUN S L, et al. Feature Augmentation based Graph Neural Recommendation Method in Session Scenarios. Chinese Journal of Computers, 2022, 45(4): 766-780.) [11] WEI Y W, WANG X, NIE L Q, et al. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video // Proc of the 27th ACM International Conference on Multimedia. New York, USA: ACM, 2019: 1437-1445. [12] LIU K, XUE F, GUO D, et al. MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. ACM Transactions on Information Systems, 2023, 41(2). DOI: 10.1145/3544106. [13] YAGN Y H, WU Z W, WU L, et al. Generative-Contrastive Graph Learning for Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 1117-1126. [14] TANG H, ZHAO G S, GAO J, et al. Personalized Representation with Contrastive Loss for Recommendation Systems. IEEE Transactions on Multimedia, 2024, 26: 2419-2429. [15] WU J C, WANG X, FENG F L, et al. Self-Supervised Graph Learning for Recommendation // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2021: 726-735. [16] LIN Z H, TIAN C X, HOU Y P, et al. Improving Graph Collaborative Filtering with Neighborhood-Enriched Contrastive Learning // Proc of the ACM Web Conference. New York, USA: ACM, 2022: 2320-2329. [17] YU J L, YIN H Z, XIA X, et al. Are Graph Augmentations Ne-cessary? Simple Graph Contrastive Learning for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2022: 1294-1303. [18] 臧秀波,夏鸿斌,刘渊. 基于自监督图掩码神经网络的社交推荐模型. 模式识别与人工智能, 2023, 36(10): 942-952. (ZANG X B, XIA H B, LIU Y. Social Recommendation Model Based on Self-Supervised Graph Masked Neural Networks. Pattern Recognition and Artificial Intelligence, 2023, 36(10): 942-952.) [19] YI Z X, WANG X, OUNIS I, et al. Multi-modal Graph Contrastive Learning for Micro-video Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2022: 1807-1811. [20] TAO Z L, LIU X H, XIA Y W, et al. Self-Supervised Learning for Multimedia Recommendation. IEEE Transactions on Multimedia, 2023, 25: 5107-5116. [21] WEI W, HUANG C, XIA L H, et al. Multi-modal Self-Supervised Learning for Recommendation // Proc of the ACM Web Confe-rence. New York, USA: ACM, 2023: 790-800. [22] HE X N, DENG K, WANG X, et al. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2020: 639-648. [23] RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian Personalized Ranking from Implicit Feedback // Proc of the 25th Conference on Uncertainty in Artificial Intelligence. Montreal, Canada: AUAI Press, 2009: 452-461. [24] WANG X, HE X N, WANG M, et al. Neural Graph Collaborative Filtering // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 165-174. [25] XIA L H, HUANG C, XU Y, et al. Hypergraph Contrastive Co-llaborative Filtering // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2022: 70-79. [26] WEI Y W, WANG X, NIE L Q, et al. Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback. // Proc of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 3541-3549. [27] ZHANG J H, ZHU Y Q, LIU Q, et al. Mining Latent Structures for Multimedia Recommendation // Proc of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 3872-3880. [28] WEI Y W, WANG X, LI Q, et al. Contrastive Learning for Cold-Start Recommendation // Proc of the 29th ACM International Conference on Multimedia. New York, USA: ACM, 2021: 5382-5390.