一种变栅格高维数据收缩聚类算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (6)

全文: PDF (532 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对数据挖掘领域中高维数据的聚类问题，提出一种变栅格的高维数据收缩聚类算法.主要是对数据箱按密度跨距进行排列，将数据点沿着密度梯度进行移动，产生浓缩聚类.采用大小可变的栅格对相连密度单元进行检测，当边界线不再改变时得到最优聚类结果.仿真结果表明，收缩聚类方法对低维、高维数据的聚类都具有良好效果.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	张建业
	潘泉
	梁建海

关键词 ：收缩聚类, 密度跨距, 变栅格, 数据箱

Abstract：A shrinkingclustering method using flexible size grid is proposed to solve the clustering problem of high dimensional data in data mining. The data bins are arranged according to their density span, and the data points are moved along the direction of the density gradient. Thus the condensed and widelyseparated clusters are generated. Then the connected components of dense cells are detected using a sequence of grids with flexible size. Finally, the best clustering result is obtained when the borderline does not change again. The simulation result shows that the method could detect clusters effectively and efficiently in both low and high dimensional data.

Key words： ShrinkingClustering Dense Span Flexible Size Grid Data Bin

收稿日期: 2006-08-14

ZTFLH:

O23

基金资助:国家自然科学基金项目(No.60304004)、陕西省自然科学基础研究计划项目(No.2005F52)资助

作者简介: 张建业，男，1971年生，博士研究生，主要研究方向为信息融合、时间序列分析.Email:zhangjianye828@163.com.潘泉，男，1961年生，教授，博士生导师，主要研究方向为信息融合、小波分析等.梁建海，男，1974年生，博士研究生，主要研究方向为模式识别技术、人工智能技术的应用.

引用本文:

张建业，潘泉，梁建海. 一种变栅格高维数据收缩聚类算法^*[J]. 模式识别与人工智能, 2007, 20(5): 716-721. ZHANG JianYe , PAN Quan , LIANG JianHai. A ShrinkingClustering Method for High Dimensional Data Using Flexible Size Grid. , 2007, 20(5): 716-721.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2007/V20/I5/716

[1] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications // Proc of the ACM SIGMOD International Conference on Management of Data. Seattle, USA, 1998: 94105
[2] Duda R O, Hart P E, Stork D. Pattern Classification. New York, USA: Wiley, 2000
[3] Scott D W. Multivariate Density Estimation: Theory, Practice, and Visualization. New York, USA: John Wiley & Sons, 1992
[4] Papadimitriou S, Kitagawa H, Gibbons P B, et al. Loci: Fast Outlier Detection Using the Local Correlation Integral // Proc of the International Conference on Data Engineering. Bangalore, India, 2003: 315-326
[5] Kleinberg J. An Impossibility Theorem for Clustering // Becker S, Thrun S, Obermayer K, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2002, 15: 446453
[6] Halkidi M, Vazirgiannis M. A Data Set Oriented Approach for Clustering Algorithm Selection // Proc of the 15th European Conference on Principles and Practice of Knowledge Discovery in Databases. Freiburg, Germany, 2001: 165179
[7] Ahuja N. Dot Pattern Processing Using Voronoi Neighborhoods. IEEE Trans on Pattern Analysis and Machine Intelligence, 1982, 4(3): 336343
[8] Guha S, Rastogi R, Shim K. Cure: An Efficient Clustering Algorithm for Large Databases // Proc of the ACM SIGMOD International Conference on Management of Data. Seattle, USA, 1998: 7384
[9]Bay S D. The UCI KDD Archive [DB/OL]. [20040827]. http://kdd.ics.uci.edu
[10]Zhang Pingding, Wang Haijun, Wang Rui. Target Identification Based on Clustering. Journal of Air Force Engineering University: Natural Science Edition, 2006, 7(2): 29-31 (in Chinese)
(张平定,王海军,王睿.一种基于聚类思想的目标识别新方法.空军工程大学学报:自然科学版, 2006, 7(2): 29-31)
[11]Zhang Liang, Zhang Fengming, Hui Xiaobin, et al. An Identification Method of Flight Data Model Based on Dynamic Fuzzy Neural Network. Journal of Air Force Engineering University: Natural Science Edition, 2006, 7(6): 16-18 (in Chinese)
(张亮,张凤鸣,惠晓滨,等.一种基于动态模糊神经网络的飞行数据模型辨识方法.空军工程大学学报:自然科学版, 2006, 7(6): 16-18)