Contrastive Learning Based on Bilevel Optimization of Pseudo Siamese Networks
CHEN Qingyu1,2,3, JI Fanfan2,3, YUAN Xiaotong2,3,4
1. School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044; 2. Engineering Research Center of Digital Forensics Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044; 3. Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044; 4. School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044
Abstract:At present, various designs are applied in contrastive learning algorithms based on pseudo siamese networks to acquire the best student network. However, the performance of teacher network in downstream tasks is ignored. Therefore, an algorithm of contrastive learning based on bilevel optimization of pseudo siamese networks(CLBO)is proposed to acquire the best teacher network by promoting the learning between student and teacher networks. The bilevel optimization strategy includes student network optimization strategy based on nearest neighbor optimization and teacher network optimization strategy based on stochastic gradient descent. The teacher network is regarded as a constraint term through the student network optimization strategy based on nearest neighbor optimization to help the student network learn better from the teacher network. The parameters are calculated by the teacher network optimization strategy based on stochastic gradient descent to update the teacher network. Experiments on 5 datasets show that CLBO performs better than other algorithms in k-NN classification and linear classification tasks. Especially, the advantages of CLBO is obvious when the batch size is smaller.
[1] LECUN Y, BENGIO Y, HINTON G. Deep Learning. Nature, 2015, 521(7553): 436-444. [2] WEI X S, LUO J H, WU J X, et al. Selective Convolutional Des-criptor Aggregation for Fine-Grained Image Retrieval. IEEE Transactions on Image Processing, 2017, 26(6): 2868-2881. [3] BACHMAN P, HJELM R D, BUCHWALTER W. Learning Representations by Maximizing Mutual Information Across Views// Proc of the 33rd International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2019: 15535-15545. [4] BENGIO Y, COURVILLE A, VINCENT P.Representation Lear-ning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. [5] IIZUKA S, SIMO-SERRA E, ISHIKAWA H.Globally and Locally Consistent Image Completion. ACM Transactions on Graphics, 2017, 36(4). DOI: 10.1145/3072959.3073659. [6] 李仲年,张涛,张道强.基于自监督边缘融合网络的MRI影像重建.模式识别与人工智能, 2021, 34(4): 361-366. (LI Z N, ZHANG T, ZHANG D Q.Self-Supervised Edge-Fusion Network for MRI Reconstruction. Pattern Recognition and Artificial Intelligence, 2021, 34(4): 361-366.) [7] LI F F, QIAO H, ZHANG B.Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders. Pattern Recognition, 2018, 83: 161-173. [8] ZHANG R, ISOLA P, EFROS A A.Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 645-654. [9] CHEN L, BENTLEY P, MORI K, et al. Self-Supervised Learning for Medical Image Analysis Using Image Context Restoration. Medical Image Analysis, 2019, 58. DOI: 10.1016/j.media.2019.101539. [10] DOSOVITSKIY A, FISCHER P, SPRINGENBERG J T, et al. Dis-criminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1734-1747. [11] GIDARIS S, BURSUC A, KOMODAKIS N, et al. Boosting Few-Shot Visual Learning with Self-Supervision// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 8058-8067. [12] ZHAI X H, OLIVER A, KOLESNIKOV A, et al. S4L: Self-Supervised Semi-Supervised Learning// Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 1476-1485. [13] CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations// Proc of the 37th International Conference on Machine Learning. New York, USA: ACM, 2020: 1597-1607. [14] CHEN T, KORNBLITH S, SWERSKY K, et al. Big Self-Super-vised Models Are Strong Semi-Supervised Learners[C/OL].[2022-04-25]. https://arxiv.org/pdf/2006.10029.pdf. [15] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735. [16] CHEN X L, FAN H Q, GIRSHICK R, et al. Improved Baselines with Momentum Contrastive Learning[C/OL].[2022-04-25]. https://arxiv.org/pdf/2003.04297.pdf. [17] CHOPRA S, HADSELL R, LECUN Y.Learning a Similarity Me-tric Discriminatively, with Application to Face Verification// Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2005: 539-546. [18] WU Z R, XIONG Y J, YU S X, et al. Unsupervised Feature Learning via Non-Parametric Instance Discrimination// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 3733-3742. [19] VAN DEN OORD A, LI Y Z, VINYALS O. Representation Lear-ning with Contrastive Predictive Coding[C/OL].[2022-04-25]. https://arxiv.org/pdf/1807.03748v1.pdf. [20] TIAN Y L, KRISHNAN D, ISOLA P.Contrastive Multiview Co-ding// Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 776-794. [21] BISHOP C M.Neural Networks for Pattern Recognition. Oxford, USA: Oxford University Press, 1995. [22] GRILL J B, STRUB F, ALTCHÉ F, et al. Bootstrap Your Own Latent a New Approach to Self-Supervised Learning// Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 21271-21284. [23] XIE Z D, LIN Y T, YAO Z L, et al. Self-Supervised Learning with Swin Transformers[C/OL].[2022-04-25]. https://arxiv.org/pdf/2105.04553.pdf. [24] CHEN X L, HE K M.Exploring Simple Siamese Representation Learning// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 15745-15753. [25] ZBONTAR J, LI J, MISRA I, et al. BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction[C/OL].[2022-04-25]. https://arxiv.org/pdf/2103.03230.pdf. [26] BARDES A, PONCE J, LECUN Y.VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning[C/OL]. [2022-04-25].https://arxiv.org/pdf/2105.04906v3.pdf. [27] PHAM H, DAI Z H, XIE Q Z, et al. Meta Pseudo Labels// Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2021: 11552-11563. [28] ZHOU P, YUAN X T, XU H, et al. Efficient Meta Learning via Minibatch Proximal Update// Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 1534-1544. [29] RAJESWARAN A, FINN C, KAKADE S M, et al. Meta-Learning with Implicit Gradients[C/OL].[2022-04-25]. https://arxiv.org/pdf/1909.04630.pdf. [30] ZHANG M R, LUCAS J, HINTON G, et al. Lookahead Optimizer: k Steps Forward, 1 Step Back[C/OL].[2022-04-25]. https://arxiv.org/pdf/1907.08610.pdf. [31] HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition// Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 770-778. [32] KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images. Technical Report, TR-2009. Toronto, Canada: University of Toronto, 2009. [33] COATES A, LEE H, NG A Y.An Analysis of Single-Layer Networks in Unsupervised Feature Learning// Proc of the 14th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR, 2011: 215-223. [34] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115: 211-252. [35] VAN DER MAATEN L, HINTON G. Visualizing Data Using t-SNE. Journal of Machine Learning Research, 2008, 9: 2579-2605.