模式识别与人工智能
Saturday, Apr. 5, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2023, Vol. 36 Issue (12): 1127-1138    DOI: 10.16451/j.cnki.issn1003-6059.202312005
Adapative Perception and Learning of Open-Environment Current Issue| Next Issue| Archive| Adv Search |
A Survey on Knowledge-Driven Multimodal Semantic Understanding
ZHENG Yihao1, GUO Yijun2, WU Lifang1, HUANG Yan3
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124;
2. Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, Beijing 100191;
3. State Key Laboratory for Multi-modal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100191

Download: PDF (1101 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Multimodal learning methods based on deep learning model achieve excellent semantic understanding performance in static, controllable and simple scenarios. However, their generalization ability in dynamic, open and other complex scenarios is still unsatisfactory. Human-like knowledge is introduced into multimodal semantic understanding methods in recent research, yielding impressive results. To gain deeper understanding of the current research progress in knowledge-driven multimodal semantic understanding, two main types of multimodal knowledge representation frameworks are summarized based on systematic investigation and analysis of relevant methods in this paper. The two main types of multimodal knowledge representation frameworks are relational and aligned, respectively. Several representative applications are discussed, including image-text matching, object detection, semantic segmentation, and vision-and-language navigation. In addition, the advantages and disadvan-tages of the current methods and the possible development trend in the future are concluded.
Key wordsMachine Learning      Deep Learning      Multimodal Semantic Understanding      Multimodal Knowledge Representation      Multimodal Semantic Analysis      Knowledge-Driven     
Received: 10 October 2023     
ZTFLH: TP 391  
Fund:Supported by National Key Research and Development Program(No.2018AAA0100400); National Natural Science Foundation of China(No.62236010); National Natural Science Foundation of China(No.62276261)
Corresponding Authors: HUANG Yan, Ph.D., associate professor. His research interests include computer vision.   
About author:: ZHENG Yihao, Ph.D. candidate. His research interests include artificial intelligence.
GUO Yijun, master, engineer. Her research interests include computer vision.
WU Lifang, Ph.D., professor. Her research interests include artificial intelligence.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
ZHENG Yihao
GUO Yijun
WU Lifang
HUANG Yan
Cite this article:   
ZHENG Yihao,GUO Yijun,WU Lifang等. A Survey on Knowledge-Driven Multimodal Semantic Understanding[J]. Pattern Recognition and Artificial Intelligence, 2023, 36(12): 1127-1138.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202312005      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2023/V36/I12/1127
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn