模式识别与人工智能
Sunday, Apr. 13, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2024, Vol. 37 Issue (5): 459-468    DOI: 10.16451/j.cnki.issn1003-6059.202405007
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Cross-Modal Multi-level Fusion Sentiment Analysis Method Based on Visual Language Model
XIE Runfeng1, ZHANG Bochao1, DU Yongping1
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124

Download: PDF (1078 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  

Image-text multimodal sentiment analysis aims to predict sentimental polarity by integrating visual modalities and text modalities. The key to solving the multimodal sentiment analysis task is obtaining high-quality multimodal representations of both visual and textual modalities and achieving efficient fusion of these representations. Therefore, a cross-modal multi-level fusion sentiment analysis method based on visual language model(MFVL) is proposed. Firstly, based on the pre-trained visual language model, high-quality multimodal representations and modality bridge representations are generated by freezing the parameters and a low-rank adaptation method being adopted for fine-tuning the large language model. Secondly, a cross-modal multi-head co-attention fusion module is designed to perform interactive weighted fusion of the visual and textual modality representations respectively. Finally, a mixture of experts module is designed to deeply fuse the visual, textual and modality bridging representations to achieve multimodal sentiment analysis. Experimental results indicate that MFVL achieves state-of-the-art performance on the public evaluation datasets MVSA-Single and HFM.

Key wordsVisual Language Model      Multimodal Fusion      Multi-head Attention      Mixture of Experts Network      Sentiment Analysis     
Received: 15 February 2024     
ZTFLH: TP391.1  
Fund:

National Key Research and Development Program of China(No.2023YFB3308004), National Natural Science Founda-tion of China(No.92267107)

Corresponding Authors: DU Yongping, Ph.D., professor. Her research interests include information retrieval, information extraction and natural language processing.   
About author:: XIE Runfeng, Master student. His research interests include natural language processing and multimodal sentiment analysis. ZHANG Bochao, Master student. His research interests include natural language processing.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
XIE Runfeng
ZHANG Bochao
DU Yongping
Cite this article:   
XIE Runfeng,ZHANG Bochao,DU Yongping. Cross-Modal Multi-level Fusion Sentiment Analysis Method Based on Visual Language Model[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 459-468.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202405007      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I5/459
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn