|
|
A Semi-Supervised Text Clustering Based on Strong Classification Features Affinity Propagation |
WEN Han1,2, XIAO Nan-Feng1 |
1School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006 2School of Science, Foshan University, Foshan 528000 |
|
|
Abstract A semi-supervised text clustering based on strong classification features affinity propagation (SCFAP) is proposed to handle spare document data with large scale and high dimensions. In the clustering process, strong classification features are extracted to construct a reasonable similarity measure by using a small amount of labeled samples. Moreover, in order to improve the execution efficiency of the algorithm, the unlabeled documents with maximum category certainty are transferred from unlabeled collection to labeled collection in each round of iteration. The experimental results show that the improvement is greatly helpful to upgrade the performance and accuracy of the classical affinity propagation algorithm. The SCFAP algorithm shows better applicability on Reuter-21578 and 20 Newsgroups. The micro average Fμ index and the clustering purity index are synthetically observed, the semi-supervised text clustering algorithm based on SCFAP can get better clustering results rapidly.
|
Received: 11 March 2013
|
|
|
|
|
|
|
|