基于重构对比的广义零样本图像分类

doi:10.16451/j.cnki.issn1003-6059.202212003

摘要
图/表
参考文献
相关文章 (8)

全文: PDF (1449 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要广义零样本图像分类中常使用生成模型重构视觉信息或语义信息用于再进一步学习.然而,基于变分自编码器的方法对重构样本利用不够充分,表示性能欠缺.因此,文中提出基于重构对比的广义零样本图像分类模型.首先,使用两个变分自编码器将视觉信息和语义信息编码为同维度的低维隐向量,再将隐向量分别解码到两种模态.然后,使用投影模块投影视觉信息与语义模态的隐向量重构的视觉模态信息.最后,对投影后的特征进行重构对比学习.在保持变分自编码器重构性能的基础上增强编码器重构的判别性能,提高预训练特征在广义零样本图像分类任务上的应用能力.在4个标准数据集上的实验证实文中模型的有效性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	许睿
	邵帅
	曹维佳
	刘宝弟
	陶大鹏
	刘伟锋

关键词 ：广义零样本图像分类, 变分自编码器, 对比学习, 语义信息, 视觉信息

Abstract：In generalized zero-shot image classification, generative models are often exploited to reconstruct visual or semantic information for further learning. However, the representation performance of the methods based on variational autoencoders is poor due to the underutilization of the reconstructed samples. Therefore, a generalized zero-shot image classification model based on reconstruction and contrastive learning is proposed. Firstly, two variational self-encoders are utilized to encode visual information and semantic information into low dimensional latent vectors of the same dimension, and then the latent vectors are decoded into two modes respectively. Next, the project modules are utilized to project both the original visual information and the visual information reconstructed from semantic modal latent vectors. Then, reconstruction contrastive learning is performed to learn the features after projection. The reconstruction performance of the encoder is maintained, the discriminative performance of the encoder is enhanced, and the application ability of pre-training features on the generalized zero-shot task is improved by the proposed method. The effectiveness of the proposed model is verified on four benchmark datasets.

Key words： Generalized Zero-Shot Image Classification Variational Autoencoders Contrastive Lear-ning Semantic Information Visual Information

收稿日期: 2022-05-20

ZTFLH:	TP391
	TP18

基金资助:国家自然科学基金项目(No.61671480)、中国石油天然气集团公司重大科技项目(No.ZD2019-183-008)、模式识别国家实验室开放项目(No.202000009)、中国石油大学项目(华东)研究生创新基金项目(No.YCX2021123)资助

通讯作者: 刘伟锋,博士,教授,主要研究方向为模式识别、机器学习.E-mail:liuwf@upc.edu.cn.

作者简介: 许睿,博士研究生,主要研究方向为小样本学习、零样本学习.E-mail:ruixu@s.upc.edu.cn.邵帅,博士,主要研究方向为字典学习、小样本学习.E-mail:shaoshuai0914@gmail.com.曹维佳,博士,助理研究员,主要研究方向为图像加密、图像压缩、图像分类.E-mail:caowj@aircas.ac.cn.刘宝弟,博士,副教授,主要研究方向为计算机视觉、机器学习.E-mail:thu.liubaodi@gmail.com.陶大鹏,博士,教授,主要研究方向为机器学习、计算机视觉、云计算.E-mail:dapeng.tao@gmail.com.

引用本文:

许睿, 邵帅, 曹维佳, 刘宝弟, 陶大鹏, 刘伟锋. 基于重构对比的广义零样本图像分类[J]. 模式识别与人工智能, 2022, 35(12): 1078-1088. XU Rui, SHAO Shuai, CAO Weijia, LIU Baodi, TAO Dapeng, LIU Weifeng. Generalized Zero-Shot Image Classification Based on Reconstruction Contrast. Pattern Recognition and Artificial Intelligence, 2022, 35(12): 1078-1088.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202212003 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2022/V35/I12/1078

[1] 冯耀功,于剑,桑基韬,等.基于知识的零样本视觉识别综述. 软件学报, 2021, 32(2): 370-405.
(FENG Y G, YU J, SANG J T, et al. Survey on Knowledge-Based Zero-Shot Visual Recognition. Journal of Software, 2021, 32(2): 370-405.)
[2] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255.
[3] LI Y, ZHANG J G, ZHANG J G, et al. Discriminative Learning of Latent Features for Zero-Shot Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7463-7471.
[4] YU F X, CAO L L, FERIS R S, et al. Designing Category-Level Attributes for Discriminative Visual Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 771-778.
[5] FROME A, CORRADO G S, SHLENS J, et al. DeViSE: A Deep Visual-Semantic Embedding Model // Proc of the 26th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2013: 2121-2129.
[6] KODIROV E, XIANG T, FU Z Y, et al. Unsupervised Domain Adap-tation for Zero-Shot Learning // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 2452-2460.
[7] ZHANG L, XIANG T, GONG S G. Learning a Deep Embedding Model for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3010-3019.
[8] YE M, GUO Y H. Progressive Ensemble Networks for Zero-Shot Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 11728-11736.
[9] 钟小容,胡晓,丁嘉昱.基于潜层向量对齐的持续零样本学习算法.模式识别与人工智能, 2021, 34(12): 1152-1159.
(ZHONG X R, HU X, DING J Y. Continual Zero-Shot Learning Algorithm Based on Latent Vectors Alignment. Pattern Recognition and Artificial Intelligence, 2021, 34(12): 1152-1159.)
[10] WANG W L, PU Y C, VERMA V K, et al. Zero-Shot Learning via Class-Conditioned Deep Generative Models // Proc of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 4211-4218.
[11] KINGMA D P, WELLING M. Auto-Encoding Variational Bayes[C/OL]. [2022-04-20]. https://arxiv.org/pdf/1312.6114.pdf.
[12] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Gene-rative Adversarial Nets // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014: 2672-2680.
[13] XIAN Y Q, LORENZ T, SCHIELE B, et al. Feature Generating Networks for Zero-Shot Learning // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 5542-5551.
[14] ARJOVSKY M, BOTTOU L. Towards Principled Methods for Trai-ning Generative Adversarial Networks[C/OL]. [2022-04-20]. https://arxiv.org/pdf/1701.04862.pdf.
[15] MISHRA A, REDDY S K, MITTAL A, et al. A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition Workshops. Washington, USA: IEEE, 2018: 2269-2277.
[16] VERMA V K, ARORA G, MISHRA A, et al. Generalized Zero-Shot Learning via Synthesized Examples // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4281-4289.
[17] TSAI Y H H, HUANG L K, SALAKHUTDINOV R. Learning Robust Visual-Semantic Embeddings // Proc of the IEEE Internatio-nal Conference on Computer Vision. Washington, USA: IEEE, 2017: 3591-3600.
[18] SCHÖNFELD E, EBRAHIMI S, SINHA S, et al. Generalized Zero-and Few-Shot Learning via Aligned Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 8239-8247.
[19] CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 1597-1607.
[20] HAN Z Y, FU Z Y, CHEN S, et al. Contrastive Embedding for Generalized Zero-Shot Learning // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2371-2381.
[21] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735.
[22] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778.
[23] HIGGINS I, MATTHEY L, PAL A, et al. β -VAE: Learning Ba-sic Visual Concepts with a Constrained Variational Framework[C/OL].[2022-04-20]. https://openreview.net/pdf?id=Sy2fzU9gl.
[24] GIVERS C R, SHORTT R M. A Class of Wasserstein Metrics for Probability Distributions. Michigan Mathematical Journal, 1984, 31(2): 231-240.
[25] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 Dataset[DB/OL].[2022-04-20]. https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf.
[26] PATTERSON G, HAYS J. Sun Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 2751-2758.
[27] XIAN Y Q, LAMPERT C H, SCHIELE B, et al. Zero-Shot Lear-ning-A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2251-2265.
[28] FARHADI A, ENDRES I, HOIEM D, et al. Describing Objects by Their Attributes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1778-1785.
[29] AKATA Z, REED S, WALTER D, et al. Evaluation of Output Embeddings for Fine-Grained Image Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 2927-2936.
[30] CHEN L, ZHANG H W, XIAO J, et al. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2018: 1043-1052.
[31] FELIX R, KUMAR V B G, REID I, et al. Multi-modal Cycle-Consistent Generalized Zero-Shot Learning // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 21-37.
[32] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-Embedding for Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(7): 1425-1438.
[33] ROMERA-PAREDES B, TORR P H S. An Embarrassingly Simple Approach to Zero-Shot Learning // Proc of the 32nd International Conference on Machine Learning. San Diego, USA: JMLR, 2015: 2152-2161.
[34] CHANGPINYO S, CHAO W L, GONG B Q, et al. Synthesized Classifiers for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5327-5336.
[35] LE CACHEUX Y, LE BORGNE H, CRUCIANU M. Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 10332-10341.