Fine-Grained Visual Classification Based on Sparse Bilinear Convolutional Neural Network
MA Li1, WANG Yongxiong1,2
1.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093 2.Shanghai Engineering Research Center of Assistive Devices, Shanghai 200093
Abstract:The overfitting problem of bilinear convolutional neural network(B-CNN) for fine-grained visual recognition is caused by the large number of parameters and its complex structure. In this paper, a sparse B-CNN is proposed to handle the problem. Firstly, a scaling factor is introduced into each feature channel of B-CNN, and regularization of sparsity is applied to the scaling factors during the training. Then, the feature channels in B-CNN with low contribution to the final classification are identified by small scaling factors. Finally, these channels are pruned in a certain proportion to prevent overfitting and increase the significance of key features. The learning of sparse B-CNN is weakly supervised and end-to-end. The verification experiments on FGVC-aircraft, Stanford dogs and Stanford cars fine-grained image datasets show that the accuracy of sparse B-CNN is higher than that of the original B-CNN. Moreover, compared with other advanced algorithms for fine-grained visual recognition, the performance of sparse B-CNN is same or even better.
[1] FARRELL R, OZA O, ZHANG N, et al. Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance // Proc of the International Conference on Computer Vision. Washington, USA: IEEE, 2011: 161-168. [2] ZHANG N, FARRELL R, DARRELL T. Pose Pooling Kernels for Sub-category Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2012: 3665-3672. [3] 罗建豪,吴建鑫.基于深度卷积特征的细粒度图像分类研究综述.自动化学报, 2017, 43(8): 1306-1318. (LUO J H, WU J X. A Survey on Fine-Grained Image Categorization Using Deep Convolutional Features. Acta Automatica Sinica, 2017, 43(8): 1306-1318.) [4] CUI Y, SONG Y, SUN C, et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4109-4118. [5] WU L, WANG Y, LI X, et al. Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition. IEEE Tran-sactions on Cybernetics, 2019, 49(5): 1791-1802. [6] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN Models for Fine-Grained Visual Recognition // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2015: 1449-1457. [7] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1409.1556.pdf. [8] LECUN Y, DENKER J S, SOLLA S A. Optimal Brain Damage // TOURETZKY D S, ed. Advances in Neural Information Processing Systems 2. San Francisco, USA: Morgan Kaufmann Publishers, 1990: 598-605. [9] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1207.0580.pdf. [10] QUINLAN J R. Bagging, Boosting, and C4. 5 // Proc of the 13th National Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 1996, I: 725-730. [11] IOFFE S, SZEGEDY C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1502.03167v3.pdf. [12] 曹文龙,芮建武,李 敏.神经网络模型压缩方法综述.计算机应用研究, 2019, 36(3): 649-656. (CAO W L, BING J W, LI M. Survey on Neural Network Model Compression Methods. Application Research of Computers, 2019, 36(3): 649-656.) [13] DENIL M, SHAKIBI B, DINH L, et al. Predicting Parameters in Deep Learning[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1306.0543.pdf. [14] HAN S, MAO H Z, DALLY W J. Deep Compression: Compre-ssing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1510.00149.pdf. [15] WEN W, WU C P, WANG Y D, et al. Learning Structured Sparsity in Deep Neural Networks[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1608.03665.pdf. [16] LIU Z, LI J G, SHEN Z Q, et al. Learning Efficient Convolutional Networks through Network Slimming // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 2755-2763. [17] LI H, KADAV A, DURDANOVIC I, et al. Pruning Filters for Efficient Convents[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1608.08710.pdf. [18] MAJI S, RAHTU E, KANNALA J, et al. Fine-Grained Visual Classification of Aircraft[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1306.5151.pdf. [19] KHOSLA A, JAYADEVAPRAKASH N, YAO B P, et al. Novel Dataset for Fine-Grained Image Categorization: Stanford Dogs[C/OL]. [2018-12-06]. http://59.80.44.98/people.csail.mit.edu/khosla/papers/fgvc2011.pdf. [20] KRAUSE J, STARK M, DENG J, et al. 3D Object Representations for Fine-Grained Categorization // Proc of the IEEE International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2013: 554-561. [21] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1405.3531.pdf. [22] GOSSELIN P H, MURRAY N, JÉGOU H, et al. Revisiting the Fisher Vector for Fine-Grained Classification. Pattern Recognition Letters, 2014, 49: 92-98. [23] 冯语姗,王子磊.自上而下注意图分割的细粒度图像分类.中国图象图形学报, 2016, 21(9): 1147-1154. (FENG Y S, WANG Z L. Fine-Grained Image Categorization with Segmentation Based on Top-Down Attention Map. Journal of Image and Graphics, 2016, 21(9): 1147-1154.) [24] SIMON M, RODNER E. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks // Proc of the IEEE International Conference on Computer Vision. Wa-shington, USA: IEEE, 2015: 1143-1151. [25] ZHANG X P, XIONG H K, ZHOU W G, et al. Picking Deep Filter Responses for Fine-Grained Image Recognition // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 1134-1142. [26] ZHAO B, WU X, FENG J S, et al. Diversified Visual Attention Networks for Fine-Grained Object Classification. IEEE Transactions on Multimedia, 2017, 19(6): 1245-1256. [27] LIU X, XIA T, WANG J, et al. Fully Convolutional Attention Networks for Fine-Grained Recognition[C/OL]. [2018-12-06]. https://arxiv.org/pdf/1603.06765.pdf. [28] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 580-587. [29] KRAUSE J, JIN H L, YANG J C, et al. Fine-Grained Recognition without Part Annotations // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2015: 5546-5555.