|
|
Fine-Grained Visual Classification Network Based on Fusion Pooling and Attention Enhancement |
XIAO Bin1, GUO Jingwei1, ZHANG Xingpeng1, WANG Min2 |
1. School of Computer Science, Southwest Petroleum University, Chengdu 610500; 2. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu 610500 |
|
|
Abstract The core of fine-grained visual classification is to extract image discriminative features.In most of the existing methods, attention mechanisms are introduced to focus the network on important regions of the object.However, this kind of approaches can only locate the salient feature and cannot cover all discriminative features. Consequently, different categories with similar features are easily confusing. Therefore, a fine-grained visual classification network based on fusion pooling and attention enhancement is proposed to obtain comprehensive discriminative features. At the end of the network, a fusion pooling module is designed with a three-branch structure to obtain multi-scale discriminative features. The three-branch structure includes global average pooling, global top-k pooling and the fusion of the previous two. In addition, an attention enhancement module is proposed to gain two more discriminative images through attention grid mixing module and attention cropping module under the guidance of attention maps. Experiments on fine-grained image datasets, CUB-200-2011, Stanford Cars and FGVC-Aircraft, verify the high accuracy rate and strong competitiveness of the proposed network.
|
Received: 23 May 2023
|
|
Fund:Sichuan Scientific Innovation Fund(No. 2022JDRC0009), Natural Science Starting Project of Southwest Petroleum University(No.2022QHZ023) |
Corresponding Authors:
ZHANG Xingpeng, Ph.D., lecturer. His research interests include image recognition, object detection and medical image segmentation.
|
About author:: XIAO Bin, master, professor. His research interests include pattern recognition. GUO Jingwei, master student. His research interests include fine-grained visual classification. WANG Min, master, professor. Her research interests include artificial intelligence and signal analysis and processing. |
|
|
|
[1] HE J, CHEN J N, LIU S, et al. TransFG: A Transformer Architecture for Fine-Grained Recognition[C/OL].[2023-04-22]. https://arxiv.org/pdf/2103.07976.pdf. [2] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-Based R-CNNs for Fine-Grained Category Detection // Proc of the European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2014: 834-849. [3] BRANSON S, VAN HORN G, BELONGIE S, et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets[C/OL].[2023-04-22]. https://arxiv.org/abs/1406.2952. [4] ZHANG H, XU T, ELHOSEINY M, et al. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for fine-Grained Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1143-1152. [5] HUANG S L, XU Z, TAO D C, et al. Part-Stacked CNN for Fine-Grained Visual Categorization // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1173-1182. [6] FU J L, ZHENG H L, MEI T. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Re-cognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 4476-4484. [7] ZHENG H L, FU J L, MEI T, et al. Learning Multi-attention Con-volutional Neural Network for Fine-Grained Image Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 5219-5227. [8] HU T, QI H G, HUANG Q M, et al. See Better before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification[C/OL].[2023-04-22]. https://arxiv.org/abs/1901.09891. [9] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2023-04-22]. https://arxiv.org/abs/2010.11929. [10] DIAO Q S, JIANG Y, WEN B, et al. Metaformer: A Unified Meta Framework for Fine-Grained Recognition[C/OL].[2023-04-22]. https://arxiv.org/abs/1406.2952. [11] CHOU P Y, LIN C H, KAO W C. A Novel Plug-In Module for Finegrained Visual Classification[C/OL]. [2023-04-22]. https://arxiv.org/abs/2202.03822. [12] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [13] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 2261-2269. [14] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature Pyramid Networks for Object Detection // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 936-944. [15] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN Models for Fine-Grained Visual Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1449-1457. [16] DE VRIES T, TAYLOR G W. Improved Regularization of Convolutional Neural Networks with Cutout[C/OL]. [2023-04-22]. https://arxiv.org/pdf/1708.04552.pdf. [17] ZHONG Z, ZHENG L, KANG G L, et al. Random Erasing Data Augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13001-13008. [18] ZHANG H Y, CISSÉ M, DAUPHIN Y N, et al. Mixup: Beyond Empirical Risk Minimization[C/OL].[2023-04-22]. https://openreview.net/pdf?id=r1Ddp1-Rb. [19] YUN S, HAN D, CHUN S, et al. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features // Proc of the IEEE/CVF International Conference on Computer Vision. Washing-ton, USA: IEEE, 2019: 6022-6031. [20] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Gene-rative Adversarial Networks. Communications of the ACM, 2020, 63(11): 139-144. [21] MIRZA M, OSINDERO S. Conditional Generative Adversarial Nets[C/OL]. [2023-04-22]. http://export.arxiv.org/abs/1411.1784. [22] CUBUK E D, ZOPH B, MANÉ D, et al. AutoAugment: Learning Augmentation Policies from Data[C/OL].[2023-04-22]. https://arxiv.org/abs/1805.09501. [23] ZHANG L B, HUANG S L, LIU W, et al. Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 8330-8339. [24] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 Dataset[DB/OL].[2023-04-22]. https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf. [25] KRAUSE J, STARK M, DENG J, et al. 3D Object Representations for Fine-Grained Categorization // Proc of the IEEE International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2013: 554-561. [26] MAJI S, RAHTU E, KANNALA J, et al. Fine-Grained Visual Classification of Aircraft[C/OL].[2023-04-22]. https://arxiv.org/abs/1306.5151. [27] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Cla-ssification with Deep Convolutional Neural Networks. Communications of the ACM, 2017, 60(6): 84-90. [28] YANG Z, LUO T G, WANG D, et al. Learning to Navigate for Finegrained Classification // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 438-454. [29] SUN M, YUAN Y C, ZHOU F, et al. Multi-attention Multi-class Constraint for Fine-Grained Image Recognition // Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 834-850. [30] CHEN Y, BAI Y L, ZHANG W, et al. Destruction and Construc-tion Learning for Fine-Grained Image Recognition // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5152-5161. [31] JI R Y, WEN L Y, ZHANG L B, et al. Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2020: 10465-10474. [32] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2023-04-22]. https://arxiv.org/pdf/1409.1556.pdf. [33] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Loca-lization // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 618-626. |
|
|
|