Cross-Modal Retrieval via Dual Adversarial Autoencoders
WU Fei1, LUO Xiaokai1, HAN Lu2, ZHENG Xinhao1, XIAO Liang1, SHUAI Zizhen1, JING Xiaoyuan3
1. College of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210003;
2. School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210003;
3. School of Computer Science,Wuhan University,Wuhan 430072
How to preserve the original features and reduce the distribution differences of multi-modal data more efficiently during the autoencoder learning process is an important research topic.A cross-modal retrieval approach via dual adversarial autoencoders(DAA) is proposed.The global adversarial network is employed to improve the data reconstruction process of the autoencoders.The min-max is implemented to make it difficult to distinguish the original features and reconstructed features.Consequently,the original features are preserved better.The hidden layer adversarial network generates modality-invariant representations and makes the inter-modal data indistinguishable from each other to reduce the distribution differences of multi-modal data effectively.Experimental results on Wikipedia and NUS-WIDE-10k datasets show the effectiveness of DAA.
[1] RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A New Approach to Cross-Modal Multimedia Retrieval//Proc of the 18th ACM International Conference on Multimedia.New York,USA:ACM,2010:251-260.
[2] LI D G,DIMITROVA N,LI M K,et al.Multimedia Content Processing through Cross-Modal Association//Proc of the 11th ACM International Conference on Multimedia.New York,USA:ACM,2003:604-611.
[3] WANG K Y,HE R,WANG L,et al.Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval.IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(10):2010-2023.
[4] 吴 帅,徐 勇,赵东宁.基于深度卷积网络的目标检测综述.模式识别与人工智能,2018,31(4):335-346.
(WU S,XU Y,ZHAO D N.Survey of Object Detection Based on Deep Convolutional Network.Pattern Recognition and Artificial Intelligence,2018,31(4):335-346.)
[5] WEI Y C,ZHAO Y,LU C Y,et al.Cross-Modal Retrieval with CNN Visual Features:A New Baseline.IEEE Transactions on Cybernetics,2017,47(2):449-460.
[6] 王 杰,张曦煌.基于图卷积网络和自编码器的半监督网络表示学习模型.模式识别与人工智能,2019,32(4):317-325.
(WANG J,ZHANG X H.Semi-supervised Network Representation Learning Model Based on Graph Convolutional Networks and Auto Encoder.Pattern Recognition and Artificial Intelligence,2019,32(4):317-325.)
[7] FENG F X,WANG X J,LI R F.Cross-Modal Retrieval with Correspondence Autoencoder//Proc of the 22nd ACM International Conference on Multimedia.New York,USA:ACM,2014:7-16.
[8] WU Y L,WANG S H,HUANG Q M.Multi-modal Semantic Autoencoder for Cross-Modal Retrieval.Neurocomputing,2019,331(28):165-175.
[9] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative Adversarial Nets//Proc of the 27th International Conference on Neural Information Processing Systems.Cambridge,USA:The MIT Press,2014,II:2672-2680.
[10] WANG B K,YANG Y,XU X,et al.Adversarial Cross-Modal Retrieval//Proc of the 25th ACM International Conference on Multimedia.New York,USA:ACM,2017:154-162.
[11] XU X,HE L,LU H M,et al.Deep Adversarial Metric Learning for Cross-Modal Retrieval.World Wide Web,2019,22(2):657-672.
[12] PENG Y X,QI J W,YUAN Y X.CM-GANs:Cross-Modal Gene-rative Adversarial Networks for Common Representation Learning.ACM Transactions on Multimedia Computing,Communications,and Applications,2019,15(1).DOI:10.1145/3284750.
[13] ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein Generative Adversarial Networks//Proc of the 34th International Confe-rence on Machine Learning.New York,USA:ACM,2017:214-223.