模式识别与人工智能
Friday, Apr. 11, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2018, Vol. 31 Issue (1): 1-11    DOI: 10.16451/j.cnki.issn1003-6059.201801001
Current Issue| Next Issue| Archive| Adv Search |
Interpretable Structured Multi-modal Deep Neural Network
XIONG Hongkai1, GAO Xing1, LI Shaohui1, XU Yuhui1, WANG Yongzhuang1,YU Haoyang2, LIU Xin2, ZHANG Yunfei3
1.Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240
2.Shenzhen Tencent Computer System Co., Ltd, Shenzhen 518000
3.Yulong Computer Communication Technology Co., Ltd, Shenzhen 518035

Download: PDF (1220 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  

Deep learning methods achieve excellent performance in the fields of computer vision and natural language processing through end-to-end supervised training dependent on large scale labeled datasets. However, the existing methods are often targeted for single modal data, ignoring the inherent structure of the data with the lack of theoretical support. Therefore, the wavelet theory based deep convolution networks, the structured deep learning and the multi-modal deep learning are discussed in this paper to demonstrate the potential methods of the combination of deep learning techniques, wavelet theory and structure prediction, and the viable mechanism for extending to multi-modal data is explored as well.

Key wordsDeep Learning      Filter Bank      Wavelet Theory      Structured Learning      Multi-modal Learning     
Received: 26 September 2017     
About author:: XIONG HongkaiCorresponding author, Ph.D., professor. His research interests include multimedia communication, signal processing, computer vision and machine learning.GAO Xing, Ph.D. candidate. His research interests include unsupervised representation learning.LI Shaohui, master student. His research interests include wavelet and scattering network.XU Yuhui, Ph.D. candidate. His research interests include computer vision and machine learning.WANG Yongzhuang, master student. His research interests include computer vision.YU Haoyang, bachelor. His research interests include micro-service cloud platform research, development and operations.LIU Xin, bachelor. His research interests include network communication and cloud computing.ZHANG Yunfei, Ph.D., senior engineer. His research interests include multimedia communication, 5G, NGI and smart terminals.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
XIONG Hongkai
GAO Xing
LI Shaohui
XU Yuhui
WANG Yongzhuang
YU Haoyang
LIU Xin
ZHANG Yunfei
Cite this article:   
XIONG Hongkai,GAO Xing,LI Shaohui等. Interpretable Structured Multi-modal Deep Neural Network[J]. , 2018, 31(1): 1-11.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201801001      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2018/V31/I1/1
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn