基于k最近邻网络的数据聚类算法

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (422 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要聚类研究在数据挖掘研究领域中占有十分重要的地位。虽然目前已有很多数据聚类算法，但精度仍不够理想。文中提出一个基于结构化相似度的网络聚类算法(SSNCA)，试图从网络聚类角度进一步提高数据聚类精度。具体解决方案是，将待聚类的向量数据集转化为k最近邻网络，并用SSNCA对该网络进行聚类。将SSNCA与c-Means、仿射传播进行比较，实验表明文中算法得到的目标函数稍差，但聚类精度要明显高于这两个算法。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：数据挖掘, 数据聚类, k最近邻网络, 复杂网络聚类

Abstract：Data clustering is a hotspot in data mining area. Though there have been lots of data clustering algorithms now, the clustering accuracy of them is far from perfect. A structural similarity based network clustering algorithm (SSNCA) is proposed in this paper, which attempt to further improve the data clustering accuracy from the view of network clustering. The concrete solution scheme is that vector dataset for clustering is converted to a k-Nearest-Neigborhood network and SSNCA is used to cluster this network. Comparing SSNCA with the algorithms of c-Means and affinity propagation (AP), experimental result shows that the fitness value got by the proposed algorithm is a little worse than AP, but its clustering accuracy is obviously better than that of the other two algorithms.

Key words： Data Mining Data Clustering k-Nearest-Neighbor Network Complex Network Clustering

收稿日期: 2009-04-27

ZTFLH:

TP181

基金资助:国家自然科学基金项目(No.60873149,60973088)、国家863计划项目(No.2006AA10Z245)资助

作者简介: 金弟，男，1981年生，博士研究生，主要研究方向为复杂网络分析、数据挖掘.刘杰，女，1973年生，博士，副教授，主要研究方向为数据挖掘、离散数学.贾正雪，男，1982年生，硕士，主要研究方向为语义Web、数据挖掘.刘大有，男，1942年生，教授，博士生导师，主要研究方向为知识工程与专家系统、Agent系统、时空推理和数据挖掘等.E-mail:liudy@jlu.edu.cn.

引用本文:

金弟，刘杰，贾正雪，刘大有. 基于k最近邻网络的数据聚类算法[J]. 模式识别与人工智能, 2010, 23(4): 546-551. JIN Di,LIU Jie,JIA Zheng-Xue,LIU Da-You. k-Nearest-Neighbor Network Based Data Clustering Algorithm. , 2010, 23(4): 546-551.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2010/V23/I4/546

[1] Sun Jigui, Liu Jie, Zhao Lianyu. Clustering Algorithms Research. Journal of Software, 2009, 19(1): 48-61 (in Chinese)
(孙吉贵,刘杰,赵连宇.聚类算法研究.软件学报, 2009, 19(1): 48-61)
[2] Zhang Tian, Ramakrishnan R, Livny M. Birch: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182
[3] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations // Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967, Ⅰ: 281-297
[4] Ester M, Riegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise // Proc of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, USA, 1996: 291-316
[5] Ding C, He Xiaofei. k-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization // Proc of the ACM Symposium on Applied Computing. Nicosia, Cyprus, 2004: 584-589
[6] Frnti P, Virmajoki O, Hautamki V. Fast Agglomerative Clustering
Using a k-Nearest Neighbor Graph. IEEE Trans on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1875-1881
[7] Zhu Qiaoming, Li Junhui, Zhou Guodong, et al. A Novel Hierarchical Document Clustering Algorithm Based on a kNN Connection Graph // Proc of the 21st International Conference on the Computer Processing of Oriental Languages. Singapore, Singapore, 2006: 120-130
[8] Teng S H, Yao F F. k-Nearest-Neighbor Clustering and Percolation Theory. Algorithmica, 2007, 49(3): 192-211
[9] Liu Dayou, Liu Jie, Jin Di. Study on k-Nearest-Neighbor Partition Based Clustering Algorithm // Proc of the National Conference on Artificial Intelligence. Harbin, China, 2007: 169-174 (in Chinese)
(刘大有,刘杰,金弟.基于k最近邻划分的聚类算法研究//中国人工智能进展2007. 哈尔滨, 2007: 169-174)
[10] Luxburg U V. A Tutorial on Spectral Clustering. Statistics and Computing, 2007, 17(4): 395-416
[11] Newman M E, Girvan M. Finding and Evaluating Community Structure in Networks [EB/OL]. [2004-02-26]. http://www.ncibi.org/gateway/pdf/newman%20community%20struct%20networks
%20phys%20rev.pdf
[12] Xu Xiaowei, Yuruk N, Feng Zhidan, et al. SCAN: A Structural Clustering Algorithm for Networks // Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 824-833
[13] Girvan M, Newman M E. Community Structure in Social and Biological Networks. Proc of National Academy of Science of USA, 2002, 99(12): 7821-7826
[14] Newman M E. Fast Algorithm for Detecting Community Structure in Networks [EB/OL]. [2004-02-26]. http://www-stat.stanford.edu/~owen/courses/315c/readings/PhysRevE_69_066133.pdf
[15] Palla G, Derenyi I, Farkas I, et al. Uncovering the Overlapping Community Structures of Complex Networks in Nature and Society. Nature, 2005, 435(7043): 814-818
[16] Yang Bo, Cheung W K, Liu Jiming. Community Mining from Signed Social Networks. IEEE Trans on Knowledge and Data Engineering, 2007, 19(10): 1333-1348
[17] Frey B J, Dueck D. Clustering by Passing Messages between Data Points. Science, 2007, 315(5814): 972-976
[18] Vlasblom J, Wodak S J. Markov Clustering Versus Affinity Propagation for the Partitioning of Protein Interaction Graphs. BMC Bioinformatics, 2009, 10: 99