LI Kun-Lun1, CAO Zheng1, CAO Li-Ping2, ZHANG Chao1, LIU Ming1
1.College of Electronic and Information Engineering, Hebei University, Baoding 071002 2.Department of Electrical and Mechanical Engineering, Baoding Vocational and Technical College, Baoding 071051
Abstract:Small amount of labeled data are used in semi-supervised clustering algorithms to improve the performance of the algorithms. It is a research hotspot in pattern recognition and its related fields. In this paper, some developments on semi-supervised clustering are introduced including constraint-based, distance-based and the combination of them. Using semi-supervised strategy to fuzzy C-means, a semi-supervised fuzzy C-means (constrained FCM) algorithm is proposed. Experimental results show that the proposed method obtains better accuracy compared with FCM and semi-supervised K-means.
李昆仑,曹铮,曹丽苹,张超,刘明. 半监督聚类的若干新进展*[J]. 模式识别与人工智能, 2009, 22(5): 735-742.
LI Kun-Lun, CAO Zheng, CAO Li-Ping, ZHANG Chao, LIU Ming. Some Developments on Semi-Supervised Clustering. , 2009, 22(5): 735-742.
[1] Olivier C, Bernhard S, Alexander Z. Semi-Supervised Learning. Cambridge, USA: MIT Press, 2006: 3-10 [2] Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training // Proc of the 11th Annual Conference on Computational Learning Theory. Madison, USA, 1998: 92-100 [3] Zhong Shi. Semi-Supervised Model-Based Document Clustering: A Comparative Study. Machine Learning, 2006, 65(1): 3-29 [4] Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means Clustering with Background Knowledge // Proc of 18th International Conference on Machine Learning. San Francisco, USA, 2001: 577-584 [5] Wagstaff K, Cardie C. Clustering with Instance-Level Constraints // Proc of the 17th International Conference on Machine Learning. San Francisco, USA, 2000: 1103-1110 [6] Huang Desheng, Pan Wei. Incorporating Biological Knowledge into Distance-Based Clustering Analysis of Micro Array Gene Expression Data. Bioinformatics, 2006, 22(10): 1259-1268 [7] Tari L, Baral C, Kim S. Fuzzy C-Means Clustering with Prior Biological Knowledge. Journal of Biomedical Informatics, 2009, 42(1): 74-81 [8] Ceccarelli M, Maratea A. Improving Fuzzy Clustering of Biological Data by Metric Learning with Side Information. International Journal of Approximate Reasoning, 2008, 47(1): 45-57 [9] Huang Ruizhang, Lam W. An Active Learning Framework for Semi-Supervised Document Clustering with Language Modeling. Data & Knowledge Engineering, 2008, 68(1): 49-67 [10] Erman J, Mahanti A, Arlitt M, et al. Offline/Realtime Traffic Classification Using Semi-Supervised Learning. Performance Evaluation, 2007, 64(9/10/11/12): 1194-1213 [11] Chang Hong, Yeung D Y. Locally Linear Metric Adaptation with Application to Semi-Supervised Clustering and Image Retrieval. Pattern Recognition, 2006, 39(7): 1253-1264 [12] Bensaid A M, Hall L O, Bezdek J C. Partially Supervised Clustering for Image Segmentation. Pattern Recognition, 1996, 29(5): 859-871 [13] Demiriz A, Bennett K P, Embrechts M J. Semi-Supervised Clustering Using Genetic Algorithms // Proc of the Artificial Neural Networks in Engineering Conference. New York, USA, 1999: 809-814 [14] Basu S, Banerjee A, Mooney R J. Semi-Supervised Clustering by Seeding // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 19-26 [15] Xing E P, Ng A Y, Jordan M I, et al. Distance Metric Learning, with Application to Clustering with Side-Information // Proc of the 16th Annual Conference on Neural Information Processing Systems. Cambridge, UK, 2003: 505-512 [16] Klein D, Kamvar S D, Manning C. From Instance-Level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 307-314 [17] Ng A Y, Jordan M I, Weiss Y. On Spectral Clustering: Analysis and an Algorithm // Proc of the Neural Information Processing Systems Conference. Vancouver, Canada, 2001: 849-856 [18] Kamvar S D, Klein D, Manning C. Spectral Learning // Proc of 18th International Joint Conference on Artificial Intelligence. Acapulco, Mexico, 2003: 561-566 [19]Bilenko M, Basu S, Mooney R J. Integrating Constraints and Metric Learning in Semi-Supervised Clustering // Proc of the 21st International Conference on Machine Learning. Banff, Canada, 2004: 81-88 [20] Basu S, Bilenko M, Mooney R J. A Probabilistic Framework for Semi-Supervised Clustering // Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2004: 59-68 [21] Kulis B, Basu S, Mooey R. Semi-Supervised Graph Clustering: A Kernel Approach // Proc of the 22nd International Conference on Machine Learning. Bonn, Germany, 2005: 457-464 [22] Yan Bojun, Domeniconi C. An Adaptive Kernel Method for Semi-Supervised Clustering // Proc of the 17th European Conference on Machine Learning. Berlin, Germany, 2006: 521-532 [23] Pedrycz W, Waletzky J. Fuzzy Clustering with Partial Supervision. IEEE Trans on Systems, Man and Cybernetics, 1997, 27(5): 787-795 [24] Bhanu B, Dong Anlei. Concepts Learning with Fuzzy Clustering and Relevance Feedback // Proc of the 2nd International Workshop on Machine Learning and Data Mining in Pattern Recognition. Leipzig, Germany, 2002: 102-116 [25] Bouchachia A, Pedrycz W. A Semi-Supervised Clustering Algorithm for Data Exploration // Proc of International Fuzzy Systems Association World Congress. Istanbul, Turkey, 2003: 328-337 [26] Demiriz A, Bennett K P, Embrechts M J. A Genetic Algorithm Approach for Semi-Supervised Clustering. Journal of Smart Engineering System Design, 2002, 4: 35-44 [27] Liu Hong, Huang S T. Evolutionary Semi-Supervised Fuzzy Clustering. Pattern Recognition Letters, 2003, 24(16): 3105-3113 [28] Wang Ling, Bo Liefeng, Jiao Licheng. Density-Sensitive Semi-Supervised Spectral Clustering. Journal of Software, 2007, 18(10): 2412-2422 (in Chinese) (王 玲,薄列峰,焦李成.密度敏感的半监督谱聚类.软件学报, 2007, 18(10): 2412-2422) [29] Yin Xuesong, Hu Enliang, Chen Songcan. Discriminative Semi-Supervised Clustering Analysis with Pairwise Constraints. Journal of Software, 2008, 19(11): 2791-2802 (in Chinese) (尹学松,胡恩良,陈松灿.基于成对约束的判别型半监督聚类分析.软件学报, 2008, 19(11): 2791-2802) [30] Xiao Yu, Yu Jian. Semi-Supervised Clustering Based on Affinity Propagation Algorithm. Journal of Software, 2008, 19(11): 2803-2813 (in Chinese) (肖 宇,于 剑.基于近邻传播算法的半监督聚类.软件学报, 2008, 19(11): 2803-2813) [31] Peng Yan, Zhang Daoqiang. Semi-Supervised Canonical Analysis Algorithm. Journal of Software, 2008, 19(11): 2822-2832 (in Chinese) (彭 岩,张道强.半监督典型相关分析算法. 软件学报, 2008, 19(11): 2822-2832) [32] Jin Jun, Zhang Daoqiang. Semi-Supervised Robust On-line Clustering Algorithm. Journal of Computer Research and Development, 2008, 45(3): 496-502 (in Chinese) (金 骏,张道强.半监督鲁棒联机聚类算法.计算机研究与发展, 2008, 45(3): 496-502)