Two-Stage Text Clustering Based on Collaborative Clustering
WANG Ming-Wen1,2, FU Jian-Bo2, LUO Yuan-Sheng3, LU Xu3
1.School of Computer Information and Engineering, Jiangxi Normal University, Nanchang 330022 2.School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013 3.Modern Education Technology Center, Jiangxi University of Finance and Economics,Nanchang 330013
Abstract:To take full advantage of the semantic relations for text clustering and feature selection, a kind of two-stage text clustering based on collaborative clustering is proposed. The documents and the features are clustered respectively to capture the semantic relations between features and topics, and these relations are used to adjust the clustering interactively. The experimental results show that the clustering performance is effectively improved by using the relations between features and topics.
王明文,付剑波,罗远胜,陆旭. 基于协同聚类的两阶段文本聚类方法*[J]. 模式识别与人工智能, 2009, 22(6): 848-853.
WANG Ming-Wen, FU Jian-Bo, LUO Yuan-Sheng, LU Xu. Two-Stage Text Clustering Based on Collaborative Clustering. , 2009, 22(6): 848-853.
[1] Shi Zhongzhi. Knowledge Discovery. Beijing, China: Tsinghua University Press, 2002 (in Chinese) (史忠植.知识发现.北京:清华大学出版社, 2002) [2] Salton G, McGill M J. An Introduction to Modern Information Retrieval. New York, USA: McGraw-Hill, 1983 [3] Yang Yiming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization // Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 412-420 [4] Sun Jianjun, Chen Yin. Information Retrieval Technology. Beijing, China: Science Press, 2004 (in Chinese) (孙建军,成 颖.信息检索技术.北京:科学出版社, 2004) [5] Liu Yuanchao, Wang Xiaolong, Liu Bingquan. A Feature Selection Algorithm for Document Clustering Based on Word Co-Occurrence Frequency // Proc of the 3rd International Conference on Machine Learning and Cybernetics. Shanghai, China, 2004, Ⅴ: 2963-2968 [6] Wang Aihua, Zhang Ming, Yang Dongqing, et al. PCCS: A Fast Clustering and Classification Method for Web Document. Journal of Computer Research and Development, 2001, 38(4): 415-421 (in Chinese) (王爱华,张 铭,杨冬青,等.PCCS部分聚类分类:一种快速的Web文档聚类方法. 计算机研究与发展, 2001, 38(4): 415-421) [7] Liu Tao, Wu Gongyi, Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering. Journal of Computer Research and Development, 2005, 42(3): 381-386 (in Chinese) (刘 涛,吴功宜,陈 正.一种高效的用于文本聚类的无监督特征选择算法.计算机研究与发展, 2005, 42(3): 381-386) [8] Yan Lili, Zhang Yanping. A Class-Based Feature Selection Algorithm for Test Clustering. Computer Engineering and Applications, 2007, 43(12): 144-146 (in Chinese) (严莉莉,张燕平.基于类信息的文本聚类中特征选择算法.计算机工程与应用, 2007, 43(12): 144-146) [9] Zhou Shuigeng, Zhou Aoying, Cao Jing, et al. A Fast Density-Based Clustering Algorithm. Journal of Computer Research and Development, 2000, 37(11): 1287-1292 (in Chinese) (周水庚,周傲英,曹 晶,等.一种基于密度的快速聚类算法.计算机研究与发展, 2000, 37(11): 1287-1292) [10] Cheng Yizong, Church G M. Biclustering of Expression Data // Proc of the 8th International Conference on Intelligent Systems for Molecular Biology. Vienna, Austria, 2000: 93-103 [11] Xue Guirong, Zeng Huajun, Chen Zheng, et al. Optimizing Web Search Using Web Click-through Data // Proc of the 13th ACM International Conference on Information and Knowledge Management. Washington, USA, 2004: 118-126