A Semi-Supervised Text Clustering Based on Strong Classification Features Affinity Propagation
WEN Han1,2, XIAO Nan-Feng1
1School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006 2School of Science, Foshan University, Foshan 528000
Abstract:A semi-supervised text clustering based on strong classification features affinity propagation (SCFAP) is proposed to handle spare document data with large scale and high dimensions. In the clustering process, strong classification features are extracted to construct a reasonable similarity measure by using a small amount of labeled samples. Moreover, in order to improve the execution efficiency of the algorithm, the unlabeled documents with maximum category certainty are transferred from unlabeled collection to labeled collection in each round of iteration. The experimental results show that the improvement is greatly helpful to upgrade the performance and accuracy of the classical affinity propagation algorithm. The SCFAP algorithm shows better applicability on Reuter-21578 and 20 Newsgroups. The micro average Fμ index and the clustering purity index are synthetically observed, the semi-supervised text clustering algorithm based on SCFAP can get better clustering results rapidly.
文翰,肖南峰. 基于强类别特征近邻传播的半监督文本聚类*[J]. 模式识别与人工智能, 2014, 27(7): 646-654.
WEN Han, XIAO Nan-Feng. A Semi-Supervised Text Clustering Based on Strong Classification Features Affinity Propagation. , 2014, 27(7): 646-654.