Fine-Grained Visual Classification Based on Sparse Bilinear Convolutional Neural Network
MA Li1, WANG Yongxiong1,2
1.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093 2.Shanghai Engineering Research Center of Assistive Devices, Shanghai 200093
Abstract The overfitting problem of bilinear convolutional neural network(B-CNN) for fine-grained visual recognition is caused by the large number of parameters and its complex structure. In this paper, a sparse B-CNN is proposed to handle the problem. Firstly, a scaling factor is introduced into each feature channel of B-CNN, and regularization of sparsity is applied to the scaling factors during the training. Then, the feature channels in B-CNN with low contribution to the final classification are identified by small scaling factors. Finally, these channels are pruned in a certain proportion to prevent overfitting and increase the significance of key features. The learning of sparse B-CNN is weakly supervised and end-to-end. The verification experiments on FGVC-aircraft, Stanford dogs and Stanford cars fine-grained image datasets show that the accuracy of sparse B-CNN is higher than that of the original B-CNN. Moreover, compared with other advanced algorithms for fine-grained visual recognition, the performance of sparse B-CNN is same or even better.
Fund:Supported by National Natural Science Foundation of China(No.61673276,61703277)
About author:: MA Li, master student. His research interests include computer vision and image processing.WANG Yongxiong(Corresponding author), Ph.D., professor. His research inte-rests include intelligent robot and vision.
[1] FARRELL R, OZA O, ZHANG N, et al. Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance // Proc of the International Conference on Computer Vision. Washington, USA: IEEE, 2011: 161-168. [2] ZHANG N, FARRELL R, DARRELL T. Pose Pooling Kernels for Sub-category Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 3665-3672. [3] 罗建豪,吴建鑫.基于深度卷积特征的细粒度图像分类研究综述.自动化学报, 2017, 43(8): 1306-1318. (LUO J H, WU J X. A Survey on Fine-Grained Image Categorization Using Deep Convolutional Features. Acta Automatica Sinica, 2017, 43(8): 1306-1318.) [4] CUI Y, SONG Y, SUN C, et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4109-4118. [5] WU L, WANG Y, LI X, et al. Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition. IEEE Tran-sactions on Cybernetics, 2019, 49(5): 1791-1802. [6] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN Models for Fine-Grained Visual Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1449-1457. [7] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1409.1556.pdf. [8] LECUN Y, DENKER J S, SOLLA S A. Optimal Brain Damage // TOURETZKY D S, ed. Advances in Neural Information Processing Systems 2. San Francisco, USA: Morgan Kaufmann Publishers, 1990: 598-605. [9] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1207.0580.pdf. [10] QUINLAN J R. Bagging, Boosting, and C4. 5 // Proc of the 13th National Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 1996, I: 725-730. [11] IOFFE S, SZEGEDY C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1502.03167v3.pdf. [12] 曹文龙,芮建武,李 敏.神经网络模型压缩方法综述.计算机应用研究, 2019, 36(3): 649-656. (CAO W L, BING J W, LI M. Survey on Neural Network Model Compression Methods. Application Research of Computers, 2019, 36(3): 649-656.) [13] DENIL M, SHAKIBI B, DINH L, et al. Predicting Parameters in Deep Learning[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1306.0543.pdf. [14] HAN S, MAO H Z, DALLY W J. Deep Compression: Compre-ssing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1510.00149.pdf. [15] WEN W, WU C P, WANG Y D, et al. Learning Structured Sparsity in Deep Neural Networks[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1608.03665.pdf. [16] LIU Z, LI J G, SHEN Z Q, et al. Learning Efficient Convolutional Networks through Network Slimming // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2755-2763. [17] LI H, KADAV A, DURDANOVIC I, et al. Pruning Filters for Efficient Convents[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1608.08710.pdf. [18] MAJI S, RAHTU E, KANNALA J, et al. Fine-Grained Visual Classification of Aircraft[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1306.5151.pdf. [19] KHOSLA A, JAYADEVAPRAKASH N, YAO B P, et al. Novel Dataset for Fine-Grained Image Categorization: Stanford Dogs[C/OL]. [2018-12-06]. http://59.80.44.98/people.csail.mit.edu/khosla/papers/fgvc2011.pdf. [20] KRAUSE J, STARK M, DENG J, et al. 3D Object Representations for Fine-Grained Categorization // Proc of the IEEE International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2013: 554-561. [21] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1405.3531.pdf. [22] GOSSELIN P H, MURRAY N, JÉGOU H, et al. Revisiting the Fisher Vector for Fine-Grained Classification. Pattern Recognition Letters, 2014, 49: 92-98. [23] 冯语姗,王子磊.自上而下注意图分割的细粒度图像分类.中国图象图形学报, 2016, 21(9): 1147-1154. (FENG Y S, WANG Z L. Fine-Grained Image Categorization with Segmentation Based on Top-Down Attention Map. Journal of Image and Graphics, 2016, 21(9): 1147-1154.) [24] SIMON M, RODNER E. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks // Proc of the IEEE International Conference on Computer Vision. Wa-shington, USA: IEEE, 2015: 1143-1151. [25] ZHANG X P, XIONG H K, ZHOU W G, et al. Picking Deep Filter Responses for Fine-Grained Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1134-1142. [26] ZHAO B, WU X, FENG J S, et al. Diversified Visual Attention Networks for Fine-Grained Object Classification. IEEE Transactions on Multimedia, 2017, 19(6): 1245-1256. [27] LIU X, XIA T, WANG J, et al. Fully Convolutional Attention Networks for Fine-Grained Recognition[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1603.06765.pdf. [28] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [29] KRAUSE J, JIN H L, YANG J C, et al. Fine-Grained Recognition without Part Annotations // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 5546-5555.