Imbalanced Node Classification Algorithm Based on Self-Supervised Learning
CUI Caixia1,2, WANG Jie3, PANG Tianjie2, LIANG Jiye1
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006; 2. College of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619; 3. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024
Abstract:In real-world node classification scenarios, only a few nodes are labeled and their class labels are imbalanced. In most of the existing methods, the lack of the supervision information and the imbalance of node classes are not taken into account at the same time, and the improvement of node classification performance cannot be guaranteed. Therefore, an imbalanced node classification algorithm based on self-supervised learning is proposed. Firstly, different views of the original graph are generated through graph data augmentation. Then, node representations are learned by maximizing the consistency of node representations across views using self-supervised learning. The supervised information is expanded and the expressive ability of nodes is enhanced by self-supervised learning. In addition, a semantic constraint loss is designed to ensure semantic consistency in graph data augmentation along with cross-entropy loss and self-supervised contrastive loss. Experimental results on three real graph datasets show that the proposed algorithm achieves better performance on solving the imbalanced node classification problem.
[1] RONG Y, HUANG W B, XU T Y, et al. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification[C/OL].[2022-07-25]. https://arxiv.org/pdf/1907.10903.pdf. [2] ZHANG M H, CHEN Y X. Link Prediction Based on Graph Neural Networks // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 5171-5181. [3] FAN W Q, MA Y, LI Q, et al. A Graph Neural Network Framework for Social Recommendations. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(5): 2033-2047. [4] HAMILTON W L. Graph Representation Learning. Synthesis Lec-tures on Artificial Intelligence and Machine Learning, 2020, 14(3): 1-159. [5] NEYSHABUR B, KHADEM A, HASHEMIFAR S, et al. NETAL: A New Graph-Based Method for Global Alignment of Protein-Protein Interaction Networks. Bioinformatics, 2013, 29(13): 1654-1662. [6] SINHA A, SHEN Z H, SONG Y, et al. An Overview of Microsoft Academic Service(mas) and Application // Proc of the 24th International Conference on World Wide Web. New York, USA: ACM, 2015: 243-246. [7] KIPF T N, WELLING M.Semi-Supervised Classification with Graph Convolutional Networks[C/OL]. [2022-07-25].https://arxiv.org/pdf/1609.02907.pdf. [8] HAMILTON W L, YING R, LESKOVEC J. Inductive Representation Learning on Large Graphs // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 1025-1035. [9] VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph Attention Networks[C/OL].[2022-07-25]. https://arxiv.org/pdf/1710.10903.pdf. [10] 白铂,刘玉婷,马驰骋,等.图神经网络.中国科学(数学), 2020, 50(3): 367-384. (BAI B, LIU Y T, MA C C, et al. Graph Neural Network. Scientia Sinica Mathematica, 2020, 50(3): 367-384.) [11] BREUER A, EILAT R, WEINSBERG U.Friend or Faux: Graph-Based Early Detection of Fake Accounts on Social Networks // Proc of the 29th International World Wide Web Conference. New York, USA: ACM, 2020: 1287-1297. [12] 熊冰妍,王国胤,邓维斌.基于样本权重的不平衡数据欠抽样方法.计算机研究与发展, 2016, 53(11): 2613-2622. (XIONG B Y, WANG G Y, DENG W B.Under-Sampling Method Based on Sample Weight for Imbalanced Data. Journal of Computer Research and Development, 2016, 53(11): 2613-2622.) [13] 平瑞,周水生,李冬.高度不平衡数据的代价敏感随机森林分类算法.模式识别与人工智能, 2020, 33(3): 249-257. (PING R, ZHOU S S, LI D.Cost Sensitive Random Forest Classification Algorithm for Highly Unbalanced Data. Pattern Recognition and Artificial Intelligence, 2020, 33(3): 249-257.) [14] GALAR M, FERNÁNDEZ A, BARRENECHEA E, et al. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics(Applications and Reviews), 2012, 42(4): 463-484. [15] ZHAO T X, ZHANG X, WANG S H.GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks // Proc of the 14th ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2021: 833-841. [16] QU L, ZHU H S, ZHENG R Q, et al. ImGAGN: Imbalanced Network Embedding via Generative Adversarial Graph Networks // Proc of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2021: 1390-1398. [17] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16: 321-357. [18] CHEN D L, LIN Y K, ZHAO G X, et al. Topology-Imbalance Learning for Semi-Supervised Node Classification // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 29885-29897. [19] SUN K, ZHU Z X, LIN Z C. AdaGCN: Adaboosting Graph Con-volutional Networks into Deep Models[C/OL]. [2022-07-25].https://arxiv.org/pdf/1908.05081.pdf. [20] SHI S H, QIAO K, YANG S, et al. Boosting-GNN: Boosting Algorithm for Graph Networks on Imbalanced Node Classification. Frontiers in Neurorobotics, 2021, 15. DOI: 10.3389/fnbot.2021.775688. [21] ERICSSON L, GOUK H, LOY C C, et al. Self-Supervised Representation Learning: Introduction, Advances, and Challenges. IEEE Signal Processing Magazine, 2022, 39(3): 42-62. [22] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning // Proc of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735. [23] CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations // Proc of the 37th International Conference on Machine Learning. San Diego, USA: PMLR, 2020: 1597-1607. [24] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics(Human Language Technologies). Stroudsburg, USA: ACL, 2019: 4171-4186. [25] GAO T Y, YAO X C, CHEN D Q.SimCSE: Simple Contrastive Learning of Sentence Embeddings // Proc of the 26th Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2021: 6894-6910. [26] YOU Y N, CHEN T L, SUI Y D, et al. Graph Contrastive Lear-ning with Augmentations // Proc of the 34th International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 5812-5823. [27] HASSANI K, KHASAHMADI A H.Contrastive Multi-view Representation Learning on Graphs // Proc of the 37th International Conference on Machine Learning. San Diego, USA: PMLR, 2020: 4116-4126. [28] JIN M, ZHENG Y Z, LI Y F, et al. Multi-scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning[C/OL].[2022-07-25]. https://arxiv.org/abs/2105.05682. [29] VELIKOVI P, FEDUS W, HAMILTON W L, et al. Deep Graph Infomax[C/OL].[2022-07-25]. https://arxiv.org/pdf/1809.10341v1.pdf. [30] JIN W, LIU X R, ZHAO X Y, et al. Automated Self-Supervised Learning for Graphs[C/OL].[2022-07-25]. https://arxiv.org/pdf/2106.05470v3.pdf. [31] LIU X, ZHANG F J, HOU Z Y, et al. Self-Supervised Learning: Generative or Contrastive. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 857-876. [32] HINTON G E, ZEMEL R S. Autoencoders, Minimum Description Length and Helmholtz Free Energy // Proc of the 6th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 1993: 3-10. [33] KIPF T N, WELLING M.Variational Graph Auto-Encoders[C/OL]. [2022-07-25].https://arxiv.org/pdf/1611.07308v1.pdf. [34] HOU Z Y, LIU X, CEN Y K, et al. GraphMAE: Self-Supervised Masked Graph Autoencoders // Proc of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2022: 594-604. [35] YANG Y Z, XU Z. Rethinking the Value of Labels for Improving Class-Imbalanced Learning // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 19290-19301. [36] LIU H, HAOCHEN J Z, GAIDON A, et al. Self-Supervised Lear-ning Is More Robust to Dataset Imbalance[C/OL].[2022-07-25]. https://arxiv.org/pdf/2110.05025.pdf. [37] LICANDRO R, SCHLEGL T, REITER M, et al. WGAN Latent Space Embeddings for Blast Identification in Childhood Acute Myeloid Leukaemia // Proc of the 24th International Conference on Pattern Recognition. Washington, USA: IEEE, 2018: 3868-3873. [38] YANG Z L, COHEN W W, SALAKHUTDINOV R.Revisiting Semi-Supervised Learning with Graph Embeddings // Proc of the 33rd International Conference on Machine Learning. San Diego, USA: PMLR, 2016: 40-48. [39] SHI M, TANG Y F, ZHU X Q, et al. Multi-class Imbalanced Graph Convolutional Network Learning // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2021: 2879-2885. [40] JOHNSON J M, KHOSHGOFTAAR T M.Survey on Deep Learning with Class Imbalance. Journal of Big Data, 2019, 6(1): 1-54. [41] SPELMEN V S, PORKODI R.A Review on Handling Imbalanced Data // Proc of the International Conference on Current Trends towards Converging Technologies. Washington, USA: IEEE, 2018. DOI: 10.1109/ICCTCT.2018.8551020.