核密度估计的聚类算法<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201705006

Abstract
Figure/Table
References
Related Citation (13)

Download: PDF (589 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Similarity measure is an important basis for clustering analysis. However, defining an efficient similarity measure for discrete symbols (categories) is difficult. In this paper, a method is proposed to measure the similarity between categories in terms of their kernel probability density. Different from the traditional simple-matching method or frequency-estimation method, under the action of the bandwidth for kernel functions, the proposed measure no longer depends on the assumption that categories on the same attribute are statistically independent. Then, a Bayesian clustering model is established based on kernel density estimation of categories, and a clustering algorithm is derived to optimize the clustering model using a likelihood-based object-to-cluster similarity measure. Finally, three data-driven approaches are proposed by leave-one-out estimation and maximum likelihood estimation to dynamically determine the optimal bandwidths in the kernel function for clustering. Experiments are conducted on real-world datasets and the results demonstrate that the proposed algorithm achieves higher clustering accuracy compared with the existing algorithms using a simple-matching distance measure or the attribute-weighting variants. The results also show that the bandwidth estimated by the proposed algorithm has practical significance in the applications, such as important feature identification.

Key words： Categorical Data Clustering Probability Model Similarity Measure Kernel Density Estimation(KDE) Bandwidth Estimation

Received: 30 September 2016

ZTFLH:

TP 311

About author:: (ZHU Jie, born in 1971, senior engineer. His research interests include pattern recognition and target identification.)
(CHEN Lifei(Corresponding author), born in 1972, Ph.D., professor. His research interests include statistical machine learning, data mining and pattern recognition.)

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	ZHU Jie
	CHEN Lifei

Cite this article:

ZHU Jie,CHEN Lifei. Clustering Algorithm with Kernel Density Estimation[J]. , 2017, 30(5): 439-447.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201705006 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I5/439