Cross-Entropy Semi-supervised Clustering Based on Pairwise Constraints
LI Chaoming1, XU Shengbing1,2, HAO Zhifeng1,3
1.School of Applied Mathematics, Guangdong University of Technology, Guangzhou 510520 2.School of Computers, Guangdong University of Technology, Guangzhou 510006 3.School of Mathematics and Big Data, Foshan University, Foshan 528000
Abstract:The objective function used in the classical maximum entropy clustering(MEC) lacks the information expression on pairwise constraints. Therefore, the effective supervision information is wasted when a small amount of pairwise constraints are known. In this paper, an algorithm of cross-entropy semi-supervised clustering(CE-sSC) based on pairwise constrains on the basis of MEC algorithm is proposed. The sample cross-entropy is utilized to describe the pairwise constraints information and introduced to the objective function of MEC as a penalty term. With Lagrange optimization procedure, the objective function can be resolved into the cluster center and the membership update equations. Experimental results indicate the proposed method effectively improves the clustering performance by using a small amount of pairwise constraints and works well on actual datasets.
[1] ZHANG Z H, ZHENG N N, SHI G. Maximum-Entropy Clustering Algorithm and Its Global Convergence Analysis. Science in China Series E(Technological Sciences), 2001, 44(1): 89-101. [2] ZHOU J, CHEN C L P, CHEN L. Maximum-Entropy-Based Multiple Kernel Fuzzy C-means Clustering Algorithm // Proc of the IEEE International Conference on Systems, Man and Cybernetics(SMC). Washington, USA: IEEE, 2014: 1198-1203. [3] WANG S T, CHUNG K F L, DENG Z H, et al. Robust Maximum Entropy Clustering Algorithm with Its Labeling for Outliers. Soft Computing, 2006, 10(7): 555-563. [4] LI L Q, JI H B, GAO X B. Maximum Entropy Fuzzy Clustering with Application to Real-Time Target Tracking. Signal Processing, 2006, 86(11): 3432-3447. [5] WANG X, MA J J, WANG S. Parallel Energy-Efficient Coverage Optimization with Maximum Entropy Clustering in Wireless Sensor Networks. Journal of Parallel and Distributed Computing, 2009, 69(10): 838-847. [6] ZHI X B, FAN J L, ZHAO F. Fuzzy Linear Discriminant Analysis-Guided Maximum Entropy Fuzzy Clustering Algorithm. Pattern Re-cognition, 2013, 46(6): 1604-1615. [7] SUN S W, JIANG Y Z, QIAN P J. Transfer Learning Based Maximum Entropy Clustering // Proc of the 4th IEEE International Conference on Information Science and Technology. Washington, USA: IEEE, 2014: 829-832. [8] QIAN P J, JIANG Y Z, DENG Z H, et al. Cluster Prototypes and Fuzzy Memberships Jointly Leveraged Cross-Domain Maximum Entropy Clustering. IEEE Transactions on Cybernetics, 2016, 46(1): 181-193. [9] YIN X S, SHU T, HUANG Q. Semi-supervised Fuzzy Clustering with Metric Learning and Entropy Regularization. Knowledge-Based Systems, 2012, 35: 304-311. [10] YAO Z Y. Semi-supervised MEC Clustering Algorithm on Maximized Central Distance. Journal of Computers, 2013, 8(10): 2570-2574. [11] GU L. The Novel Seeding-Based Semi-supervised Fuzzy Clustering Algorithm Inspired by Diffusion Processes // Proc of the 10th International Conference on Advances in Neural Networks. Berlin, Germany: Springer, 2013, I: 565-573. [12] FAU ER S, SCHWENKER F. Semi-supervised Clustering of Large Data Sets with Kernel Methods. Pattern Recognition Letters, 2014, 37: 78-84. [13] YU Z W, LUO P N, YOU J, et al. Incremental Semi-supervised Clustering Ensemble for High Dimensional Data Clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(3): 701-714. [14] GRIRA N, CRUCIANU M, BOUJEMAA N. Semi-supervised Fuzzy Clustering with Pairwise-Constrained Competitive Agglomeration // Proc of the 14th IEEE International Conference on Fuzzy Systems. Washington, USA: IEEE, 2005: 867-872. [15] LU Z D, LEEN T K. Semi-supervised Learning with Penalized Probabilistic Clustering // Proc of the 17th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2004: 849-856. [16] NELSON B, COHEN I. Revisiting Probabilistic Models for Clustering with Pair-Wise Constraints // Proc of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 673-680. [17] ZENG H, CHEUNG Y M. Semi-supervised Maximum Margin Clustering with Pairwise Constraints. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(5): 926-939. [18] XIONG S C, AZIMI J, FERN X Z. Active Learning of Constraints for Semi-supervised Clustering. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 43-54. [19] ANAND S, MITTAL S, TUZEL O, et al. Semi-supervised Kernel Mean Shift Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(6): 1201-1215. [20] LI L L, GARIBALDI J M, HE D J, et al. Semi-supervised Fuzzy Clustering with Feature Discrimination. PloS one, 2015, 10(9): e0131160. [21] FAN W Q, WANG C D, LAI J H. SDenPeak: Semi-supervised Nonlinear Clustering Based on Density and Distance // Proc of the 2nd IEEE International Conference on Big Data Computing Service and Applications. Washington, USA: IEEE, 2016: 269-275. [22] LEI Q, YU H P, WU M, et al. Modeling of Complex Industrial Process Based on Active Semi-supervised Clustering. Engineering Applications of Artificial Intelligence, 2016, 56: 131-141. [23] 王 玲,薄列峰,焦李成.密度敏感的半监督谱聚类.软件学报, 2007, 18(10): 2412-2422. (WANG L, BO L F, JIAO L C. Density-Sensitive Semi-supervised Spectral Clustering. Journal of Software, 2007, 18(10): 2412-2422. [24] DING S F, JIA H J, ZHANG L W, et al. Research of Semi-supervised Spectral Clustering Algorithm Based on Pairwise Constraints. Neural Computing and Applications, 2014, 24(1): 211-219. [25] JI X, XU W. Document Clustering with Prior Knowledge // Proc of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2006: 405-412. [26] WANG X, QIAN B Y, DAVIDSON I. On Constrained Spectral Clustering and Its Applications. Data Mining and Knowledge Discovery, 2014, 28(1): 1-30. [27] KROESE D P, RUBINSTEIN R Y, TAIMRE T. Application of the Cross-Entropy Method to Clustering and Vector Quantization. Journal of Global Optimization, 2007, 37(1): 137-157. [28] TABOR J, SPUREK P. Cross-Entropy Clustering. Pattern Recognition, 2014, 47(9): 3046-3059. [29] S/MIEJA M, TABOR J. Spherical Wards Clustering and Genera-lized Voronoi Diagrams // Proc of the IEEE International Conference on Data Science and Advanced Analytics. Washington, USA: IEEE, 2015. DOI: 10.1109/DSAA.2015.7344976. [30] SPUREK P, TABOR J, BYRSKI K. Active Function Cross-Entropy Clustering. Expert Systems with Applications, 2017, 72: 49-66. [31] 贺杨成,王士同,江 南.成对约束的属性加权半监督模糊核聚类算法.计算机工程与应用, 2011, 47(24): 136-138. (HE Y C, WANG S T, JIANG N. Semi-supervised Mercer-Kernel Based Fuzzy Clustering Algorithm with Pairwise Constraints and Attribute Weighted. Computer Engineering and Applications, 2011, 47(24): 136-138.) [32] 尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析.软件学报, 2008, 19(11): 2791-2802. (YIN X S, HU E L, CHEN S C. Discriminative Semi-supervised Clustering Analysis with Pairwise Constraints. Journal of Software, 2008, 19(11): 2791-2802.) [33] 肖 宇,于 剑.基于近邻传播算法的半监督聚类.软件学报, 2008, 19(11): 2803-2813. (XIAO Y, YU J. Semi-supervised Clustering Based on Affinity Propagation Algorithm. Journal of Software, 2008, 19(11): 2803-2813.) [34] 蒋伟进,许宇晖,王 欣.基于谱图和成对约束的主动半监督聚类算法.控制与决策, 2013, 28(6): 904-908.
(JIANG W J, XU Y H, WANG X. Active Semi-supervised Clus-tering Algorithm Based on Pair-Wise Constraints. Control and Decision, 2013, 28(6): 904-908.) [35] 张良均,杨 坦,肖 刚,等.MATLAB数据分析与挖掘实战.北京:机械工业出版社, 2015. (ZHANG L J, YANG T, XIAO G, et al. MATLAB Data Analysis and Data Mining. Beijing, China: China Machine Press, 2015.)