Generalized Zero-Shot Image Classification Based on Reconstruction Contrast
XU Rui1, SHAO Shuai2, CAO Weijia3, LIU Baodi1, TAO Dapeng4, LIU Weifeng1
1. College of Control Science and Engineering, China University of Petroleum(East China), Qingdao 266580; 2. Research Institute of Basic Theories, Zhejiang Laboratory, Hangzhou 311121; 3. National Engineering Research Center of Remote Sensing Satellite Applications, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094; 4. School of Information Science and Engineering, Yunnan University, Yunnan 650500
Abstract:In generalized zero-shot image classification, generative models are often exploited to reconstruct visual or semantic information for further learning. However, the representation performance of the methods based on variational autoencoders is poor due to the underutilization of the reconstructed samples. Therefore, a generalized zero-shot image classification model based on reconstruction and contrastive learning is proposed. Firstly, two variational self-encoders are utilized to encode visual information and semantic information into low dimensional latent vectors of the same dimension, and then the latent vectors are decoded into two modes respectively. Next, the project modules are utilized to project both the original visual information and the visual information reconstructed from semantic modal latent vectors. Then, reconstruction contrastive learning is performed to learn the features after projection. The reconstruction performance of the encoder is maintained, the discriminative performance of the encoder is enhanced, and the application ability of pre-training features on the generalized zero-shot task is improved by the proposed method. The effectiveness of the proposed model is verified on four benchmark datasets.
[1] 冯耀功,于剑,桑基韬,等.基于知识的零样本视觉识别综述. 软件学报, 2021, 32(2): 370-405. (FENG Y G, YU J, SANG J T, et al. Survey on Knowledge-Based Zero-Shot Visual Recognition. Journal of Software, 2021, 32(2): 370-405.) [2] DENG J, DONG W, SOCHER R, et al. ImageNet: A Large-Scale Hierarchical Image Database // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 248-255. [3] LI Y, ZHANG J G, ZHANG J G, et al. Discriminative Learning of Latent Features for Zero-Shot Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 7463-7471. [4] YU F X, CAO L L, FERIS R S, et al. Designing Category-Level Attributes for Discriminative Visual Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 771-778. [5] FROME A, CORRADO G S, SHLENS J, et al. DeViSE: A Deep Visual-Semantic Embedding Model // Proc of the 26th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2013: 2121-2129. [6] KODIROV E, XIANG T, FU Z Y, et al. Unsupervised Domain Adap-tation for Zero-Shot Learning // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 2452-2460. [7] ZHANG L, XIANG T, GONG S G. Learning a Deep Embedding Model for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 3010-3019. [8] YE M, GUO Y H. Progressive Ensemble Networks for Zero-Shot Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 11728-11736. [9] 钟小容,胡晓,丁嘉昱.基于潜层向量对齐的持续零样本学习算法.模式识别与人工智能, 2021, 34(12): 1152-1159. (ZHONG X R, HU X, DING J Y. Continual Zero-Shot Learning Algorithm Based on Latent Vectors Alignment. Pattern Recognition and Artificial Intelligence, 2021, 34(12): 1152-1159.) [10] WANG W L, PU Y C, VERMA V K, et al. Zero-Shot Learning via Class-Conditioned Deep Generative Models // Proc of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto, USA: AAAI, 2018: 4211-4218. [11] KINGMA D P, WELLING M. Auto-Encoding Variational Bayes[C/OL]. [2022-04-20]. https://arxiv.org/pdf/1312.6114.pdf. [12] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Gene-rative Adversarial Nets // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2014: 2672-2680. [13] XIAN Y Q, LORENZ T, SCHIELE B, et al. Feature Generating Networks for Zero-Shot Learning // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 5542-5551. [14] ARJOVSKY M, BOTTOU L. Towards Principled Methods for Trai-ning Generative Adversarial Networks[C/OL]. [2022-04-20]. https://arxiv.org/pdf/1701.04862.pdf. [15] MISHRA A, REDDY S K, MITTAL A, et al. A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition Workshops. Washington, USA: IEEE, 2018: 2269-2277. [16] VERMA V K, ARORA G, MISHRA A, et al. Generalized Zero-Shot Learning via Synthesized Examples // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4281-4289. [17] TSAI Y H H, HUANG L K, SALAKHUTDINOV R. Learning Robust Visual-Semantic Embeddings // Proc of the IEEE Internatio-nal Conference on Computer Vision. Washington, USA: IEEE, 2017: 3591-3600. [18] SCHÖNFELD E, EBRAHIMI S, SINHA S, et al. Generalized Zero-and Few-Shot Learning via Aligned Variational Autoencoders // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 8239-8247. [19] CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 1597-1607. [20] HAN Z Y, FU Z Y, CHEN S, et al. Contrastive Embedding for Generalized Zero-Shot Learning // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2371-2381. [21] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735. [22] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [23] HIGGINS I, MATTHEY L, PAL A, et al. β -VAE: Learning Ba-sic Visual Concepts with a Constrained Variational Framework[C/OL].[2022-04-20]. https://openreview.net/pdf?id=Sy2fzU9gl. [24] GIVERS C R, SHORTT R M. A Class of Wasserstein Metrics for Probability Distributions. Michigan Mathematical Journal, 1984, 31(2): 231-240. [25] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 Dataset[DB/OL].[2022-04-20]. https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf. [26] PATTERSON G, HAYS J. Sun Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 2751-2758. [27] XIAN Y Q, LAMPERT C H, SCHIELE B, et al. Zero-Shot Lear-ning-A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2251-2265. [28] FARHADI A, ENDRES I, HOIEM D, et al. Describing Objects by Their Attributes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 1778-1785. [29] AKATA Z, REED S, WALTER D, et al. Evaluation of Output Embeddings for Fine-Grained Image Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 2927-2936. [30] CHEN L, ZHANG H W, XIAO J, et al. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2018: 1043-1052. [31] FELIX R, KUMAR V B G, REID I, et al. Multi-modal Cycle-Consistent Generalized Zero-Shot Learning // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Sprin-ger, 2018: 21-37. [32] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-Embedding for Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(7): 1425-1438. [33] ROMERA-PAREDES B, TORR P H S. An Embarrassingly Simple Approach to Zero-Shot Learning // Proc of the 32nd International Conference on Machine Learning. San Diego, USA: JMLR, 2015: 2152-2161. [34] CHANGPINYO S, CHAO W L, GONG B Q, et al. Synthesized Classifiers for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 5327-5336. [35] LE CACHEUX Y, LE BORGNE H, CRUCIANU M. Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 10332-10341.