Deep Contrastive Multi-view Clustering with Transformer Fusion
LI Shunyong1,2, YUAN Zhiying1, ZHAO Xingwang3,4
1. School of Mathematics and Statistics, Shanxi University, Tai-yuan 030006; 2. Key Laboratory of Complex Systems and Data Science of Mi-nistry of Education, Shanxi University, Taiyuan 030006; 3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006; 4. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:As an important task of unsupervised learning, multi-view clustering is designed to fuse heterogeneous view information to mine a consistent clustering structure. In the existing methods, the low-level features extracted by autoencoders lack cross-view semantic consistency, and simple fusion strategies lack dynamic assessment of view quality. Additionally, there is an absence of multi-level contrast constraints and local-global label alignment mechanisms. To address these issues, a deep contrastive multi-view clustering algorithm with Transformer fusion(DCMCTF) is proposed. First, cross-view alignment of low-level feature distributions is achieved under an alternating adversarial learning mechanism, and then instance-level and cluster-level dual contrastive learning mechanisms are introduced to enhance cross-view consistency and feature discriminative ability. Second, a Transformer adaptive fusion module is leveraged to dynamically learn view relationships. Robust consensus representations are generated by combining quality-aware scoring, and the global labels obtained from consensus representations are aligned with local labels of specific views. Experiments on 9 datasets demonstrate that DCMCTF achieves excellent clustering performance.
[1] REDDY M G,REDDY P V N, REDDY P R. Multi-modal Medical Image Fusion Using 3-Stage Multiscale Decomposition and PCNN with Adaptive Arguments. International Journal of Image and Gra-phics, 2023, 23(3). DOI: 10.1142/S0219467822400101. [2] WANG Y.Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(1s). DOI: 10.1145/3408317. [3] HUANG D, WANG C D, LAI J H.Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(11): 11388-11402. [4] QIN Y L, FENG G R, REN Y L, et al. Consistency-Induced Multiview Subspace Clustering. IEEE Transactions on Cybernetics, 2022, 53(2): 832-844. [5] 赵兴旺,王淑君,刘晓琳,等.基于二部图的联合谱嵌入多视图聚类算法.软件学报, 2023, 35(9): 4408-4424. (ZHAO X W, WANG S J, LIU X L, et al. Joint Spectral Embe-dding Multi-view Clustering Algorithm Based on Bipartite Graphs. Journal of Software, 2023, 35(9): 4408-4424.) [6] LI Z Y, WANG Q Q, TAO Z Q, et al. Deep Adversarial Multi-view Clustering Network // Proc of the 28th International Joint Confe-rence on Artificial Intelligence. San Francisco, USA: IJCAI, 2019: 2952-2958. [7] YAN W Q, YANG T Y, TANG C.Self-Supervised Semantic Soft Label Learning Network for Deep Multi-view Clustering. IEEE Transactions on Multimedia, 2025, 27: 4971-4983. [8] WANG J, WU B, REN Z W, et al. Decomposed Deep Multi-view Subspace Clustering with Self-Labeling Supervision. Information Sciences, 2024, 653. DOI: 10.1016/j.ins.2023.119798. [9] 王静红,陈潇,王熙照,等.基于自适应结构增强的对比协同多视图属性图聚类.模式识别与人工智能, 2025, 38(9): 809-819. (WANG J H, CHEN X, WANG X Z, et al. Contrastive Collaborative Multi-view Attribute Graph Clustering Based on Adaptive Structure Enhancement. Pattern Recognition and Artificial Intelligence, 2025, 38(9): 809-819.) [10] LI D, WANG H B, WANG Y F, et al. Instance-Wise Multi-view Representation Learning. Information Fusion, 2023, 91: 612-622. [11] WANG J, FENG S H, LÜ G Y, et al. Triple-Granularity Contrastive Learning for Deep Multi-view Subspace Clustering // Proc of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023: 2994-3002. [12] YAN W Q, ZHANG Y Y, TANG C, et al. Anchor-Sharing and Cluster-Wise Contrastive Network for Multiview Representation Learning. IEEE Transactions on Neural Networks and Learning Systems, 2024, 36(2): 3797-3807. [13] CUI J R, LI Y T, HUANG H, et al. Dual Contrast-Driven Deep Multi-view Clustering. IEEE Transactions on Image Processing, 2024, 33: 4753-4764. [14] ZHU P F, YAO X J, WANG Y, et al. Multiview Deep Subspace Clustering Networks. IEEE Transactions on Cybernetics, 2024, 54(7): 4280-4293. [15] LIU J, CAO F Y, JING X C, et al. Deep Multi-view Graph Clustering Network with Weighting Mechanism and Collaborative Training. Expert Systems with Applications, 2024, 236. DOI: 10.1016/j.eswa.2023.121298. [16] CHEN Z, WU X J, XU T Y, et al. Multi-layer Multi-level Comprehensive Learning for Deep Multi-view Clustering. Information Fusion, 2025, 116. DOI: 10.1016/j.inffus.2024.102785. [17] CHEN X L, HE K M.Exploring Simple Siamese Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 15745-15753. [18] CHEN T, KORNBLITH S, NOROUZI M, et al. A Simple Framework for Contrastive Learning of Visual Representations // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 1597-1607. [19] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735. [20] 张凯涵,冯晨娇,姚凯旋,等.基于对比学习和语义增强的多模态推荐算法.模式识别与人工智能, 2024, 37(6): 479-490. (ZHANG K H, FENG C J, YAO K X, et al. Multimodal Reco-mmendation Algorithm Based on Contrastive Learning and Semantic Enhancement. Pattern Recognition and Artificial Intelligence, 2024, 37(6): 479-490.) [21] BIAN J T, LIN Y X, XIE X H, et al. Multilevel Contrastive Multiview Clustering with Dual Self-Supervised Learning. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(6): 10422-10436. [22] FEI L K, HE J L, ZHU Q, et al. Deep Multi-view Contrastive Clustering via Graph Structure Awareness. IEEE Transactions on Image Processing, 2025, 34: 3805-3816. [23] CHENG J F, WANG Q Q, TAO Z Q, et al. Multi-view Attribute Graph Convolution Networks for Clustering // Proc of the 29th International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2020: 2973-2979. [24] CAI X, WANG H, HUANG H, et al. Joint Stage Recognition and Anatomical Annotation of Drosophila Gene Expression Patterns. Bioinformatics, 2012, 28(12): i16-i24. [25] CHEN M S, LIN J Q, LI X L, et al. Representation Learning in Multi-view Clustering: A Literature Review. Data Science and Engineering, 2022, 7(3): 225-241. [26] PENG X, HUANG Z Y, LÜ J C, et al. COMIC: Multi-view Clustering without Parameter Selection // Proc of the 36th International Conference on Machine Learning. San Diego, USA: JMLR, 2019: 5092-5101. [27] KUMAR A, RAI P, DAUMÉ H.Co-regularized Multi-view Spectral Clustering[C/OL]. [2025-10-25].https://proceedings.neurips.cc/paper/2011/file/31839b036f63806cba3f47b93af8ccb5-Paper.pdf. [28] XIAO H, RASUL K, VOLLGRAF R.Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms[C/OL]. [2025-10-25].https://arxiv.org/abs/1708.07747. [29] KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images. Technical Report. Toronto, Canada: University of Toronto, 2009. [30] LI F F, FERGUS R, PERONA P.Learning Generative Visual Mo-dels from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories // Proc of the Conference on Computer Vision and Pattern Recognition Workshop. Washington, USA: IEEE, 2004. DOI: 10.1109/CVPR.2004.383. [31] WANG D, HAN S W, WANG Q, et al. Pseudo-Label Guided Co-llective Matrix Factorization for Multiview Clustering. IEEE Transactions on Cybernetics, 2022, 52(9): 8681-8691. [32] KANG Z, ZHOU W T, ZHAO Z T, et al. Large-Scale Multi-view Subspace Clustering in Linear Time. Proc of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4412-4419. [33] TROSTEN D J, LØKSE S, JENSSEN R, et al. Reconsidering Re-presentation Alignment for Multi-view Clustering // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 1255-1265. [34] KE G Z, HONG Z Y, ZENG Z Q, et al. CONAN: Contrastive Fusion Networks for Multi-view Clustering // Proc of the IEEE International Conference on Big Data. Washington, USA: IEEE, 2021: 653-660. [35] XU J, REN Y Z, LI G F, et al. Deep Embedded Multi-view Clustering with Collaborative Training. Information Sciences, 2021, 573: 279-290. [36] YAN W Q, ZHANG Y Y, LÜ C L, et al. GCFAgg: Global and Cross-View Feature Aggregation for Multi-view Clustering // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2023: 19863-19872. [37] XU J, TANG H Y, REN Y Z, et al. Multi-level Feature Learning for Contrastive Multi-view Clustering // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 16030-16039. [38] YAN W B, ZHOU Y Y, WANG Y F, et al. Multi-view Semantic Consistency Based Information Bottleneck for Clustering. Know-ledge-Based Systems, 2024, 288. DOI: 10.1016/j.knosys.2024.111448. [39] BIAN J T, XIE X H, LAI J H, et al. Multi-view Contrastive Clustering via Integrating Graph Aggregation and Confidence Enhancement. Information Fusion, 2024, 108. DOI: 10.1016/j.inffus.2024.102393. [40] ZHANG Y Y, YAN W Q, TANG C, et al. Multi-branch Space Sharing Feature Aggregation for Contrastive Multi-view Clustering. Pattern Recognition, 2025. DOI: 10.1016/j.patcog.2025.111704. [41] VAN DER MAATEN L, HINTON G. Visualizing Data Using t-SNE. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.