Abstract:Existing incomplete multi-view clustering algorithms based on nonnegative matrix factorization(NMF) cannot extract local features accuratly. To solve this problem, an algorithm of chunk-by-chunk incomplete multi-view clustering based on orthogonal constraints (CIMVCO) is proposed. A potential feature matrix of all views is obtained by nonnegative matrix factorization, and orthogonal constraints are added to obtain better local features. For missing samples of each view, smaller weights are given to reduce the impact of missing data. To solve the problem of large scale data clustering, data are processed block-by-block to reduce the memory demand and processing time. Experimental results on Reuters and Digit datasets demonstrate the effectiveness of CIMVCO.
[1] BICKEL S, SCHEFFER T. Multi-view Clustering // Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE, 2004: 19-26. [2] YANG Y, WANG H. Multi-view Clustering: A Survey. Big Data Mining and Analytics, 2018, 1(2): 83-107. [3] ZOU P, LI F Z, ZHANG L. Nonnegative and Adaptive Multi-view Clustering // Proc of the 24th International Conference on Pattern Recognition. Washington, USA: IEEE, 2018: 1247-1252. [4] LIU J L, WANG C, GAO J, et al. Multi-view Clustering via Joint Nonnegative Matrix Factorization // Proc of the SIAM International Conference on Data Mining. Philadelphia, USA: SIAM, 2013: 252-260. [5] ZHANG X T, ZHANG X C, LIU H, et al. Multi-task Multi-view Clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3324-3338. [6] ZHAO L, CHEN Z K, WANG Z J. Unsupervised Multiview Nonnegative Correlated Feature Learning for Data Clustering. IEEE Signal Processing Letters, 2018, 25(1): 60-64. [7] CHAU V T N, PHUNG N H, TRAN V T N. A Robust and Effective Algorithmic Framework for Incomplete Educational Data Clustering // Proc of the 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science. Washington, USA: IEEE, 2015: 65-70. [8] GAO H, PENG Y X, JIAN S L. Incomplete Multi-view Clustering // Proc of the International Conference on Intelligent Information Processing. Berlin, Germany: Springer, 2016: 245-255. [9] LI S Y, JIANG Y, ZHOU Z H. Partial Multi-view Clustering. Proceedings of the National Conference on Artificial Intelligence, 2014, 3: 1968-1974. [10] SHAO W X, HE L F, YU P S. Multiple Incomplete Views Clustering via Weighted Nonnegative Matrix Factorization with L2,1 Regularization // Proc of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2015: 318-334. [11] RAI P, TRIVEDI A, DAUMÉ III H, et al. Multiview Clustering with Incomplete Views[C/OL]. [2019-09-22]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.5079&rep=rep1&type=pdf. [12] WANG S W, LI M M, HU N, et al. K-means Clustering with Incomplete Data. IEEE Access, 2019, 7: 69162-69171. [13] ZHANG G Y, HUANG D, WANG C D, et al. Weighted Multi-view On-Line Competitive Clustering // Proc of the 2nd IEEE International Conference on Big Data Computing Service and Applications. Washington, USA: IEEE, 2016: 286-292. [14] SHAO W X, HE L F, LU C T, et al. Online Multi-view Clustering with Incomplete Views // Proc of the IEEE International Confe-rence on Big Data. Washington, USA: IEEE, 2016: 1012-1017. [15] SHAO W, HE L, LU C, et al. Online Unsupervised Multi-view Feature Selection // Proc of the 16th IEEE International Confe-rence on Data Mining. Washington, USA: IEEE, 2016: 1203-1208. [16] BU F Y, CHEN Z K, ZHANG Q C, et al. Incomplete Big Data Clustering Algorithm Using Feature Selection and Partial Distance // Proc of the 5th International Conference on Digital Home. Washington, USA: IEEE, 2014: 263-266. [17] 丁祥武,郭 涛,王 梅,等.一种大规模分类数据聚类算法及其并行实现.计算机研究与发展, 2016, 53(5): 1063-1071. (DING X W, GUO T, WANG M, et al. A Clustering Algorithm for Large-Scale Categorical Data and Its Parallel Implementation. Journal of Computer Research and Development, 2016, 53(5): 1063-1071.) [18] 张建朋,陈福才,李邵梅,等.基于仿射传播的进化数据流在线聚类算法.模式识别与人工智能, 2014, 27(5): 443-451. (ZHANG J P, CHEN F C, LI S M, et al. Online Clustering of Evolution Data Stream Based on Affinity Propagation Clustering. Pattern Recognition and Artificial Intelligence, 2014, 27(5): 443-451.) [19] SNASEL V, ABDULLA H D. Search Results Clustering Using Non-negative Matrix Factorization(NMF) // Proc of the International Conference on Advances in Social Network Analysis and Mining. Washington, USA: IEEE, 2009: 320-323. [20] TANG J, CENG X Y, PENG B. New Methods of Data Clustering and Classification Based on NMF // Proc of the International Conference on Business Computing and Global Informatization. Wa-shington, USA: IEEE, 2011: 432-435. [21] ZHANG X C, ZONG L L, LIU X Y, et al. Constrained Clustering With Nonnegative Matrix Factorization. IEEE Transactions on Neural Networks and Learning Systems, 2015, 27(7): 1514-1526. [22] GÜRAN A, GANIZ M C, NAIBOGˇLU H S, et al. NMF Based Dimension Reduction Methods for Turkish Text Clustering // Proc of the IEEE INISTA. Washington, USA: IEEE, 2013: 1-5. [23] LIU K, WANG H, RISACHER S, et al. Multiple Incomplete Views Clustering via Non-negative Matrix Factorization with Its Application in Alzheimer's Disease Analysis // Proc of the 15th IEEE International Symposium on Biomedical Imaging. Washington, USA: IEEE, 2018: 1402-1405. [24] KUTIL R, FLATZ M, VAJTERSIC M. Improvements in Approximation Performance and Parallelization of Nonnegative Matrix Factorization with Newton Iteration // Proc of the International Confe-rence on High Performance Computing and Simulation. Washington, USA: IEEE, 2017: 887-888. [25] BERTSEKAS D P. Projected Newton Methods for Optimization Pro-blems with Simple Constraints. SIAM Journal on Control and Optimization, 1982, 20(2): 221-246.