LI Kun-Lun1, CAO Zheng1, CAO Li-Ping2, ZHANG Chao1, LIU Ming1
1.College of Electronic and Information Engineering, Hebei University, Baoding 071002 2.Department of Electrical and Mechanical Engineering, Baoding Vocational and Technical College, Baoding 071051
Abstract Small amount of labeled data are used in semi-supervised clustering algorithms to improve the performance of the algorithms. It is a research hotspot in pattern recognition and its related fields. In this paper, some developments on semi-supervised clustering are introduced including constraint-based, distance-based and the combination of them. Using semi-supervised strategy to fuzzy C-means, a semi-supervised fuzzy C-means (constrained FCM) algorithm is proposed. Experimental results show that the proposed method obtains better accuracy compared with FCM and semi-supervised K-means.
[1] Olivier C, Bernhard S, Alexander Z. Semi-Supervised Learning. Cambridge, USA: MIT Press, 2006: 3-10 [2] Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training // Proc of the 11th Annual Conference on Computational Learning Theory. Madison, USA, 1998: 92-100 [3] Zhong Shi. Semi-Supervised Model-Based Document Clustering: A Comparative Study. Machine Learning, 2006, 65(1): 3-29 [4] Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means Clustering with Background Knowledge // Proc of 18th International Conference on Machine Learning. San Francisco, USA, 2001: 577-584 [5] Wagstaff K, Cardie C. Clustering with Instance-Level Constraints // Proc of the 17th International Conference on Machine Learning. San Francisco, USA, 2000: 1103-1110 [6] Huang Desheng, Pan Wei. Incorporating Biological Knowledge into Distance-Based Clustering Analysis of Micro Array Gene Expression Data. Bioinformatics, 2006, 22(10): 1259-1268 [7] Tari L, Baral C, Kim S. Fuzzy C-Means Clustering with Prior Biological Knowledge. Journal of Biomedical Informatics, 2009, 42(1): 74-81 [8] Ceccarelli M, Maratea A. Improving Fuzzy Clustering of Biological Data by Metric Learning with Side Information. International Journal of Approximate Reasoning, 2008, 47(1): 45-57 [9] Huang Ruizhang, Lam W. An Active Learning Framework for Semi-Supervised Document Clustering with Language Modeling. Data & Knowledge Engineering, 2008, 68(1): 49-67 [10] Erman J, Mahanti A, Arlitt M, et al. Offline/Realtime Traffic Classification Using Semi-Supervised Learning. Performance Evaluation, 2007, 64(9/10/11/12): 1194-1213 [11] Chang Hong, Yeung D Y. Locally Linear Metric Adaptation with Application to Semi-Supervised Clustering and Image Retrieval. Pattern Recognition, 2006, 39(7): 1253-1264 [12] Bensaid A M, Hall L O, Bezdek J C. Partially Supervised Clustering for Image Segmentation. Pattern Recognition, 1996, 29(5): 859-871 [13] Demiriz A, Bennett K P, Embrechts M J. Semi-Supervised Clustering Using Genetic Algorithms // Proc of the Artificial Neural Networks in Engineering Conference. New York, USA, 1999: 809-814 [14] Basu S, Banerjee A, Mooney R J. Semi-Supervised Clustering by Seeding // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 19-26 [15] Xing E P, Ng A Y, Jordan M I, et al. Distance Metric Learning, with Application to Clustering with Side-Information // Proc of the 16th Annual Conference on Neural Information Processing Systems. Cambridge, UK, 2003: 505-512 [16] Klein D, Kamvar S D, Manning C. From Instance-Level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 307-314 [17] Ng A Y, Jordan M I, Weiss Y. On Spectral Clustering: Analysis and an Algorithm // Proc of the Neural Information Processing Systems Conference. Vancouver, Canada, 2001: 849-856 [18] Kamvar S D, Klein D, Manning C. Spectral Learning // Proc of 18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003: 561-566 [19]Bilenko M, Basu S, Mooney R J. Integrating Constraints and Metric Learning in Semi-Supervised Clustering // Proc of the 21st International Conference on Machine Learning. Banff, Canada, 2004: 81-88 [20] Basu S, Bilenko M, Mooney R J. A Probabilistic Framework for Semi-Supervised Clustering // Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2004: 59-68 [21] Kulis B, Basu S, Mooey R. Semi-Supervised Graph Clustering: A Kernel Approach // Proc of the 22nd International Conference on Machine Learning. Bonn, Germany, 2005: 457-464 [22] Yan Bojun, Domeniconi C. An Adaptive Kernel Method for Semi-Supervised Clustering // Proc of the 17th European Conference on Machine Learning. Berlin, Germany, 2006: 521-532 [23] Pedrycz W, Waletzky J. Fuzzy Clustering with Partial Supervision. IEEE Trans on Systems, Man and Cybernetics, 1997, 27(5): 787-795 [24] Bhanu B, Dong Anlei. Concepts Learning with Fuzzy Clustering and Relevance Feedback // Proc of the 2nd International Workshop on Machine Learning and Data Mining in Pattern Recognition. Leipzig, Germany, 2002: 102-116 [25] Bouchachia A, Pedrycz W. A Semi-Supervised Clustering Algorithm for Data Exploration // Proc of International Fuzzy Systems Association World Congress. Istanbul, Turkey, 2003: 328-337 [26] Demiriz A, Bennett K P, Embrechts M J. A Genetic Algorithm Approach for Semi-Supervised Clustering. Journal of Smart Engineering System Design, 2002, 4: 35-44 [27] Liu Hong, Huang S T. Evolutionary Semi-Supervised Fuzzy Clustering. Pattern Recognition Letters, 2003, 24(16): 3105-3113 [28] Wang Ling, Bo Liefeng, Jiao Licheng. Density-Sensitive Semi-Supervised Spectral Clustering. Journal of Software, 2007, 18(10): 2412-2422 (in Chinese) (王 玲,薄列峰,焦李成.密度敏感的半监督谱聚类.软件学报, 2007, 18(10): 2412-2422) [29] Yin Xuesong, Hu Enliang, Chen Songcan. Discriminative Semi-Supervised Clustering Analysis with Pairwise Constraints. Journal of Software, 2008, 19(11): 2791-2802 (in Chinese) (尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析.软件学报, 2008, 19(11): 2791-2802) [30] Xiao Yu, Yu Jian. Semi-Supervised Clustering Based on Affinity Propagation Algorithm. Journal of Software, 2008, 19(11): 2803-2813 (in Chinese) (肖 宇,于 剑.基于近邻传播算法的半监督聚类.软件学报, 2008, 19(11): 2803-2813) [31] Peng Yan, Zhang Daoqiang. Semi-Supervised Canonical Analysis Algorithm. Journal of Software, 2008, 19(11): 2822-2832 (in Chinese) (彭 岩,张道强.半监督典型相关分析算法. 软件学报, 2008, 19(11): 2822-2832) [32] Jin Jun, Zhang Daoqiang. Semi-Supervised Robust On-line Clustering Algorithm. Journal of Computer Research and Development, 2008, 45(3): 496-502 (in Chinese) (金 骏,张道强.半监督鲁棒联机聚类算法.计算机研究与发展, 2008, 45(3): 496-502)