Abstract Semantic information can provide rich prior knowledge for few-shot learning. However, existing few-shot learning studies only superficially explore the combination of images and semantics, failing to fully utilize semantics to explore class features. Consequently, the model performance is limited. To address this issue, a semantic-based prototype optimization method for few-shot learning(SBPO) is proposed. First, SBPO employs channel-wise semantic prompts to guide the model in extracting visual features while progressively optimizing class prototypes. Second, a multi-modal margin loss is designed to integrate inter-class correlations in both visual and semantic dimensions with the loss function, thereby constraining the model to enhance the distinctiveness of class prototypes. Finally, through a two-stage fine-tuning process, the model can fully leverage semantic knowledge to optimize class prototypes, thereby improving classification accuracy. Experiments on four benchmark datasets demonstrate that SBPO significantly outperforms baseline methods.
Fund:National Key Research and Development Program of China(No.2021YFA1000102), National Natural Science Foundation of China(No.62376285,62272375,61673396), Natural Science Foundation of Shandong Province(No.ZR2022MF260)
Corresponding Authors:
SHAO Mingwen, Ph.D., professor. His research interests include computer vision.
About author:: LIU Yuanyuan, Master student. Her research interests include computer vision and few-shot learning methods. ZHANG Lixu, Master student. His research interests include computer vision and cross-domain few-shot learning methods. SHAO Xun, Master student. Her research interests include computer vision and domain adaptation methods.
[1] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching Networks for One Shot Learning // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 3637-3645. [2] LUO X, WU H, ZHANG J, et al. A Closer Look at Few-Shot Cla-ssification Again. Proceeding of Machine Learning Research, 2023, 202: 23103-23123. [3] PAN S J, YANG Q. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359. [4] FINN C, ABBEEL P, LEVINE S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Proceeding of Machine Lear-ning Research, 2017, 70: 1126-1135. [5] SNELL J, SWERSKY K, ZEMEL R. Prototypical Networks for Few-Shot Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 4080-4090. [6] WU J M, ZHANG T Z, ZHANG Y D, et al. Task-Aware Part Mining Network for Few-Shot Learning // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 8413-8422. [7] 孙晓鹏,余璐,徐常胜. 基于特征空间增强重放和偏差校正的类增量学习方法.模式识别与人工智能, 2024, 37(8): 729-740 (SUN X P, YU L, XU C S. Class-Incremental Learning Method Based on Feature Space Augmented Replay and Bias Correction. Pattern Recognition and Artificial Intelligence, 2024, 37(8): 729-740.) [8] 陈宁,刘凡,董晨炜,等.基于局部对比学习与新类特征生成的小样本图像分类.模式识别与人工智能, 2024, 37(10): 936-946. (CHEN N, LIU F, DONG C W, et al. Few-Shot Image Classification Based on Local Contrastive Learning and Novel Class Feature Generation. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 936-946.) [9] YANG F Y, WANG R P, CHEN X L. Semantic Guided Latent Parts Embedding for Few-Shot Learning // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 5436-5446. [10] 谢润锋,张博超,杜永萍. 基于视觉语言模型的跨模态多级融合情感分析方法.模式识别与人工智能, 2024, 37(5): 459-468. (XIE R F, ZHANG B C, DU Y P. Cross-Modal Multi-level Fusion Sentiment Analysis Method Based on Visual Language Model. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 459-468.) [11] XING C, ROSTAMZADEH N, ORESHKIN B N, et al. Adaptive Cross-Modal Few-Shot Learning // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 4847-4857. [12] ZHANG B Q, LI X T, YE Y M, et al. Prototype Completion with Primitive Knowledge for Few-Shot Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 3753-3764. [13] PENG Z M, LI Z C, ZHANG J G, et al. Few-Shot Image Recognition with Knowledge Transfer // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE,2019: 441-449. [14] TIAN Y L, WANG Y, KRISHNAN D, et al. Rethinking Few-Shot Image Classification: A Good Embedding Is All You Need?// Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 266-282. [15] YANG F Y, WANG R P, CHEN X L. SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot Learning // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2022: 1586-1596. [16] LI A X, HUANG W R, LAN X, et al. Boosting Few-Shot Lear-ning with Adaptive Margin Loss // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 12573-12581. [17] CHEN Z T, FU Y W, ZHANG Y D, et al. Multi-level Semantic Feature Augmentation for One-Shot Learning. IEEE Transactions on Image Processing, 2019, 28(9): 4594-4605. [18] RADFORD A, KIM J W, HALLACY C, et al. Learning Transferable Visual Models from Natural Language Supervision. Procee-dings of Machine Learning Research, 2021, 139: 8748-8763. [19] REN M, TRIANTAFILLOU E, LAROCHELLE H, et al. Meta-Learning for Semi-Supervised Few-Shot Classification[C/OL].[2024-12-20]. https://arxiv.org/pdf/1803.00676. [20] DEVOS A, CHATEL S, GROSSGLAUSER M.[Re] Meta Lear-ning with Differentiable Closed-Form Solvers. ReScience C, 2019, 5(2). DOI: 10.5281/zenodo.3160540. [21] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 Dataset[C/OL]. [2024-12-20]. http://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf. [22] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115: 211-252. [23] YAN K, BOURAOUI Z, WANG P, et al. Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning // Proc of the International Conference on Multimedia Retrieval. New York, USA: ACM, 2021: 367-375. [24] PENNINGTON J, SOCHER R, MANNING C D. GloVe: Global Vectors for Word Representation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1532-1543. [25] GIDARIS S, KOMODAKIS N. Dynamic Few-Shot Visual Learning without Forgetting // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4367-4375. [26] HOU R B, CHANG H, MA B P, et al. Cross Attention Network for Few-Shot Classification // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 4003-4014. [27] WERTHEIMER D, TANG L M, HARIHARAN B. Few-Shot Cla-ssification with Feature Map Reconstruction Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 8008-8017. [28] XU C M, FU Y W, LIU C, et al. Learning Dynamic Alignment via Meta-Filter for Few-Shot Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 5178-5187. [29] RONG Y, LU X B, SUN Z Y, et al. ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving Few-Shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(8): 9596-9605. [30] LE C P, DONG J C, SOITANI M, et al. Task Affinity with Maximum Bipartice Matching in Few-Shot Learning[C/OL].[2024-12-20]. https://arxiv.org/pdf/211002399. [31] ZHANG C, CAI Y J, LIN G S, et al. DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 12200-12210. [32] FEI N Y, GAO Y Z, LU Z W, et al. Z-Score Normalization, Hubness, and Few-Shot Learning // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 142-151. [33] YANG F Y, WANG R P, CHEN X L. SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot Learning // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2022: 1586-1596. [34] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-Learning with Differentiable Convex Optimization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 10649-10657. [35] KIM J, KIM H, KIM G. Model-Agnostic Boundary-Adversarial Samp-ling for Test-Time Generalization in Few-Shot Learning // Proc of the 16th European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 599-617.