模式识别与人工智能
Saturday, March 15, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2021, Vol. 34 Issue (11): 1017-1027    DOI: 10.16451/j.cnki.issn1003-6059.202111005
Deep Learning Design and Application Current Issue| Next Issue| Archive| Adv Search |
Multi-modal and Multi-label Emotion Detection for Comics Based on Two-Stream Network
LIN Zhentao1, ZENG Bi1 , PAN Zhihao1, WEN Song1
1. School of Computers, Guangdong University of Technology, Guangzhou 510006

Download: PDF (3921 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Comic is widely applied for metaphorizing social phenomena and expressing emotion in social media. To solve the problem of label ambiguity in multi-modal and multi-label emotion detection of comic scenes, a multi-modal and multi-label emotion detection model for comics based on two-stream network is proposed. The inter-modal information is compared using cosine similarity and combined with a self-attention mechanism to merge image features and text features. Then, the backbone of the method is a two-stream structure taking the Transformer model as the image backbone network to extract image features and taking the Roberta pre-training model as the text backbone network to extract text features. The improved cosine similarity is combined with cosine self-attention mechanism and multi-head self-attention mechanism(COS-MHSA) to extract the high-level features of the image. Finally, the multi-modal features of the high-level features and COS-MHSA are fused. The effectiveness of the proposed method is verified on EmoRecCom dataset, and the emotion detection result is presented in a visual manner.
Key wordsComic Emotion Detection      Cosine Similarity      Multi-head Self-Attention Mechanism      Multi-modal Fusion     
Received: 05 July 2021     
ZTFLH: TP 391  
Fund:National Natural Science Foundation of China(No.61672169), Natural Science Foundation of Guangdong Province(No.2021A1515012233)
Corresponding Authors: ZENG Bi, Ph.D., professor. Her research interests include machine learning and big data applications.   
About author:: LIN Zhentao, master student. His research interests include multi-modal emotion analysis and pattern recognition.
PAN Zhihao, master student. His research interests include natural language processing and emotion analysis.
WEN Song, master student. His research interests include multi-modal fusion and big data.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
LIN Zhentao
ZENG Bi
PAN Zhihao
WEN Song
Cite this article:   
LIN Zhentao,ZENG Bi,PAN Zhihao等. Multi-modal and Multi-label Emotion Detection for Comics Based on Two-Stream Network[J]. , 2021, 34(11): 1017-1027.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202111005      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2021/V34/I11/1017
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn