Abstract:A semantic auto-encoder structure improved by relation network is proposed and used for zero sample identification algorithm to handle the projection domain shift problem and improve the robustness of distance similarity measure in the traditional model of zero-shot recognition. The feature map between image visual features and semantic vectors is constructed by the proposed algorithm based on the semantic auto-encoder, and then the reconstructed vector is sent to the neural network after concatenating the true value of the corresponding vector. Finally, the prediction category is determined by the output scalar. The experimental results show that compared with the traditional distance measurement method, the recognition rate of the proposed algorithm on the public datasets AWA, CUB and ImageNet-2 is improved and its semantic-visual projection has a better effect than back projection on some datasets.
[1] SMIRNOV E A, TIMOSHENKO D M, ANDRIANOV S N. Comparison of Regularization Methods for ImageNet Classification with Deep Convolutional Neural Networks. AASRI Procedia, 2014, 6: 89-94. [2] LAMPERT C H, NICKISCH H, HARMELING S. Attribute-Based Classification for Zero-Shot Visual Object Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 453-465. [3] HWANG S J, SHA F, GRAUMAN K. Sharing Features between Objects and Their Attributes // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2011: 1761-1768. [4] CHEN L, ZHANG Q, LI B X. Predicting Multiple Attributes via Relative Multi-task Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 1027-1034. [5] AKATA Z, REED S, WALTER D, et al. Evaluation of Output Embeddings for Fine-Grained Image Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 2927-2936. [6] XIAN Y Q, AKATA Z, SHARMA G, et al. Latent Embeddings for Zero-Shot Classification // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 69-77. [7] BA J L, SWERSKY K, FIDLER S, et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 4247-4255. [8] 乔 雪,彭 晨,段 贺,等.基于共享特征相对属性的零样本图像分类.电子与信息学报, 2017, 39(7): 1563-1570. (QIAO X, PENG C, DUAN H, et al. Shared Features Based Relative Attributes for Zero-Shot Image Classification. Journal of Electronics and Information Technology, 2017, 39(7): 1563-1570.) [9] 程玉虎,乔 雪,王雪松.基于混合属性的零样本图像分类.电子学报, 2017, 45(6): 1462-1468. (CHENG Y H, QIAO X, WANG X S. Hybrid Attribute-Based Zero-Shot Image Classification. Acta Electronica Sinica, 2017, 45(6): 1462-1468.) [10] JI Z, YU Y L, PANG Y W, et al. Manifold Regularized Cross-Modal Embedding for Zero-Shot Learning. Information Sciences, 2017, 378: 48-58. [11] SOCHER R, GANJOO M, BASTANI O, et al. Zero-Shot Lear-ning through Cross-Modal Transfer // BURGES C J C, BOTTOU L, WELLING W, et al., eds. Advances in Neural Information Processing Systems 26. Cambridge, USA: The MIT Press, 2013: 935-943. [12] FU Y W, HOSPEDALES T M, XIANG T, et al. Transductive Multi-view Zero-Shot Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(11): 2332-2345. [13] KODIROV E, XIANG T, FU Z Y, et al. Unsupervised Domain Adaptation for Zero-Shot Learning // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2016: 2452-2460. [14] ZHANG L, XIANG T, GONG S G. Learning a Deep Embedding Model for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2021-2030. [15] KODIROV E, XIANG T, GONG S G. Semantic Autoencoder for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4447-4456. [16] XIAN Y Q, LORENZ T, SCHIELE B, et al. Feature Generating Networks for Zero-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 5542-5551. [17] PENNINGTON J, SOCHER R, MANNING C. GloVe: Global Ve-ctors for Word Representation[C/OL]. [2018-07-01]. https://nlp.stanford.edu/pubs/glove.pdf. [18] SAENKO K, KULIS B, FRITZ M, et al. Adapting Visual Category Models to New Domains // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2010: 213-226. [19] BARTELS R H, STEWART G W. Algorithm 432: Solution of the Matrix Equation AX+XB=C. Communications of the ACM, 1972, 15(9): 820-826. [20] SUNG F, YANG Y X, ZHANG L, et al. Learning to Compare: Relation Network for Few-Shot Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 1199-1208. [21] ZHAO M Z, XU B, LIN H F, et al. Discover Potential Adverse Drug Reactions Using the Skip-Gram Model // Proc of the IEEE International Conference on Bioinformatics and Biomedicine. Washington, USA: IEEE, 2015: 1765 - 1767. [22] VAN DER MAATEN L, HINTON G. Visualizing Data Using t-SNE. Journal of Machine Learning Research, 2008, 9: 2579-2605. [23] ROMERA-PAREDES B, TORR P H S. An Embarrassingly Simple Approach to Zero-Shot Learning // FERIS R S, LAMPERT C, PARIKH D, eds. Visual Attributes. Berlin, Germany: Springer, 2017: 11-30. [24] ZHANG Z M, SALIGRAMA V. Zero-Shot Learning via Joint Latent Similarity Embedding // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 6034-6042. [25] FU Z Y, XIANG T A, KODIROV E, et al. Zero-Shot Object Recognition by Semantic Manifold Distance // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 2635-2644. [26] NOROUZI M, MIKOLOV T, BENGIO S, et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings[C/OL]. [2018-07-01]. https://arxiv.org/pdf/1312.5650.pdf.