基于关联约束的对抗跨模态检索方法

doi:10.16451/j.cnki.issn1003-6059.202101007

摘要
图/表
参考文献
相关文章 (2)

全文: PDF (690 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要现有跨模态检索方法主要使用某一指标约束得到一个子空间,检索结果往往有差异.为了提高公共子空间的鲁棒性,文中提出基于关联约束的对抗跨模态检索方法.对抗约束通过混淆判别器使其无法分辨子空间特征来自哪个模态,从而提升不同模态特征的一致性.关联约束用于增强投影子空间关联程度.三元组约束同时考虑不同模态同一语义、相同模态不同语义样本之间的结构信息.在数据集上的实验表明文中方法的检索性能得到有效提升.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	郭倩
	钱宇华
	梁新彦

关键词 ：跨模态检索, 多指标约束, 对抗约束, 关联约束

Abstract：In the existing cross-modal retrieval methods, retrieval results are obtained via the subspace acquired by a certain index constraint such as distance or similarity. Since the subspaces are learned with different index constraints, retrieval results are different. To improve the robustness of common subspace, a method for adversarial cross-modal retrieval based on association constraint is proposed. The consistency of different modality features is improved by the adversarial constraint to make the discriminator in the constraint unable to distinguish which modality the subspace features come from. The association of different modality features is enhanced by the association constraint. The structural information between example pairs with the same semantics of different modalities and different semantics of the same modality is taken into account by the triple loss constraint. Experimental results on datasets show that the proposed method is more effective than other cross-modal retrieval methods.

Key words： Cross-Modal Retrieval Multi-index Constraint Adversarial Constraint Association Constraint

收稿日期: 2020-10-15

ZTFLH:

TP 391

基金资助:国家自然科学基金项目(No.61672332,61802238,61603228,62006146,61906115,F060308)、山西省重点研发计划(国际科技合作)项目(No.201903D421003)、山西省拔尖创新人才支持计划、山西省三晋学者、山西省回国留学人员科研项目(No.2017023,2018172,HGKY2019001)、山西省青年基金项目(No.201901D211171,201901D211169)、山西省高等学校科技创新项目(No.2020L0036)资助

通讯作者: 钱宇华,博士,教授,主要研究方向为模式识别、特征选择、粗糙集理论、粒计算、人工智能.E-mail:jinchengqyh@126.com.

作者简介: 郭倩,博士研究生,主要研究方向为深度学习、跨模态检索、逻辑学习.E-mail:czguoqian@163.com.
梁新彦,博士研究生,主要研究方向为多模态数据融合、跨模态检索.E-mail:liangxinyan48@163.com.

引用本文:

郭倩, 钱宇华, 梁新彦. 基于关联约束的对抗跨模态检索方法[J]. 模式识别与人工智能, 2021, 34(1): 68-76. GUO Qian, QIAN Yuhua, LIANG Xinyan. Adversarial Cross-Modal Retrieval Based on Association Constraint. , 2021, 34(1): 68-76.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202101007 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2021/V34/I1/68

[1] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 2004, 16(12): 2639-2664.
[2] 张鸿,吴飞,庄越挺.基于特征子空间学习的跨媒体检索方法.模式识别与人工智能, 2008, 21(6): 739-745.
(ZHANG H, WU F, ZHUANG Y T. Cross-Media Retrieval Method Based on Feature Subspace Learning. Pattern Recognition and Artificial Intelligence, 2008, 21(6): 739-745.)
[3] 庄凌,王超,周峰,等.相关空间嵌入算法及其在图像检索中的应用.模式识别与人工智能, 2014, 27(4): 363-371.
(ZHUANG L, WANG C, ZHOU F, et al. Correlation Space Embedding Algorithm and Its Application to Image Retrieval. Pattern Recognition and Artificial Intelligence, 2014, 27(4): 363-371.)
[4] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks // Proc of the 25th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2012, I: 1097-1105.
[5] LECUN Y, BENGIO Y, HINTON G E, et al. Deep Learning. Nature, 2015, 521(7553): 436-444.
[6] 李钦,游雄,李科,等.图像深度层次特征提取算法.模式识别与人工智能, 2017, 30(2): 127-136.
(LI Q, YOU X, LI K, et al. Deep Hierarchical Feature Extraction Algorithm. Pattern Recognition and Artificial Intelligence, 2017, 30(2): 127-136.)
[7] FENG F X, WANG X J, LI R F, et al. Cross-Modal Retrieval with Correspondence Autoencoder // Proc of the 22nd ACM International Conference on Multimedia. New York, USA: ACM, 2014: 7-16.
[8] PENG Y X, HUANG X, QI J W, et al. Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks // Proc of the 25th International Joint Conference on Artificial Intelligence. New York, USA: ACM, 2016: 3846-3853.
[9] WANG K Y, HE R, WANG W, et al. Learning Coupled Feature Spaces for Cross-Modal Matching // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 2088-2095.
[10] 王科俊,马慧,管凤旭,等.基于图像采集质量评价的指纹与指静脉双模态识别决策级融合方法.模式识别与人工智能, 2012, 25(4): 669-675.
(WANG K J, MA H, GUAN F X, et al. Dual-Modal Decision Fusion for Fingerprint and Finger Vein Recognition Based on Image Capture Quality Evaluation. Pattern Recognition and Artificial Intelligence, 2012, 25(4): 669-675.)
[11] ANDREW G, ARORA R, BILMES J, et al. Deep Canonical Correlation Analysis // Proc of the 30th International Conference on Machine Learning. New York, USA: ACM, 2013, III: 1247-1255.
[12] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative Adversarial Nets // Proc of the 27th International Confe-rence on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2014, II: 2672-2680.
[13] WANG B K, YANG Y, XU X, et al. Adversarial Cross-Modal Retrieval // Proc of the 25th ACM International Conference on Multimedia. New York, USA: ACM, 2017: 154-162.
[14] 钱宇华,张明星,成红红.关联学习:关联关系挖掘新视角.计算机研究与发展, 2020, 57(2): 424-432.
(QIAN Y H, ZHANG M X, CHENG H H, et al. Association Learning: A New Perspective of Mining Association. Journal of Computer Research and Development, 2020, 57(2): 424-432.)
[15] 成红红,钱宇华,胡治国,等.基于邻域视角的关联关系挖掘方法.中国科学(信息科学), 2020, 50(6): 824-844.
(CHENG H H, QIAN Y H, HU Z G, et al. Association Mining Method Based on Neighborhood Perspective. Scientia Sinica Informationis, 2020, 50(6): 824-844.)
[16] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2020-09-22]. https://arxiv.org/pdf/1409.1556.pdf.
[17] SCHROFF F, KALENICHENKO D, PHILBIN J, et al. FaceNet: A Unified Embedding for Face Recognition and Clustering // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 815-823.
[18] CHENG D, GONG Y H, ZHOU S P, et al. Person Re-identification by Multi-channel Parts-Based CNN with Improved Triplet Loss Function // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1335-1344.
[19] PEREIRA J C, COVIELLO E, DOYLE G, et al. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.
[20] CHUA T S, TANG J H, HONG R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore // Proc of the 8th ACM International Conference on Image and Video Retrieval. New York, USA: ACM, 2009: 368-375.
[21] RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting Image Annotations Using Amazon′s Mechanical Turk // Proc of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon′s Mechanical Turk. New York, USA: ACM, 2010: 139-147.
[22] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common Objects in Context // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2014: 740-755.
[23] GONG Y C, KE Q F, ISARD M, et al. A Multi-view Embedding Space for Modeling Internet Images, Tags, and Their Semantics. International Journal of Computer Vision, 2014, 106(2): 210-233.
[24] ZHAI X H, PENG Y X, XIAO J G, et al. Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 965-978.
[25] WANG K Y, HE R, WANG L, et al. Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2010-2023.
[26] SRIVASTAVA N, SALAKHUTDINOV R. Learning Representations for Multimodal Data with Deep Belief Nets // Proc of the International Conference on Machine Learning workshop. New York, USA: ACM, 2012: 79-86.
[27] NGIAM J, KHOSL A, KIM M, et al. Multimodal Deep Learning // Proc of the 28th International Conference on Machine Learning. New York, USA: ACM, 2011: 689-696.
[28] PENG Y X, QI J W, YUAN Y X. Modality-Specific Cross-Modal Similarity Measurement with Recurrent Attention Network. IEEE Transactions on Image Processing, 2018, 27(11): 5585-5599.
[29] OU W H, XUAN R S, GOU J P, et al. Semantic Consistent Adversarial Cross-Modal Retrieval Exploiting Semantic Similarity. Multimedia Tools and Applications, 2020, 79: 14733-14750.
[30] XU X, SONG J K, LU H M, et al. Modal-Adversarial Semantic Learning Network for Extendable Cross-Modal Retrieval // Proc of the ACM International Conference on Multimedia Retrieval. New York, USA: ACM, 2018: 46-54.
[31] KLEIN B, LEV G, SADEH G, et al. Associating Neural Word Embeddings with Deep Image Representations Using Fisher Vectors // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 4437-4446.
[32] KARPATHY A, LI F F. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
[33] MA L, LU Z D, SHANG L F, et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 2623-2631.
[34] WANG L W, LI Y, LAZEBNIK S. Learning Deep Structure-Preserving Image-Text Embeddings // Proc of the IEEE Conference on
Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5005-5013.
[35] WEHRMANN J, MATTJIE A, BARROS R C, et al. Order Embeddings and Character-Level Convolutions for Multimodal Alignment. Pattern Recognition Letters, 2018, 102: 15-22.