模式识别与人工智能
Friday, Apr. 11, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2017, Vol. 30 Issue (5): 439-447    DOI: 10.16451/j.cnki.issn1003-6059.201705006
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Clustering Algorithm with Kernel Density Estimation
ZHU Jie1, CHEN Lifei2
1.Southwest China Institute of Electronic Technology, Chengdu 610036
2. College of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350117

Download: PDF (589 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Similarity measure is an important basis for clustering analysis. However, defining an efficient similarity measure for discrete symbols (categories) is difficult. In this paper, a method is proposed to measure the similarity between categories in terms of their kernel probability density. Different from the traditional simple-matching method or frequency-estimation method, under the action of the bandwidth for kernel functions, the proposed measure no longer depends on the assumption that categories on the same attribute are statistically independent. Then, a Bayesian clustering model is established based on kernel density estimation of categories, and a clustering algorithm is derived to optimize the clustering model using a likelihood-based object-to-cluster similarity measure. Finally, three data-driven approaches are proposed by leave-one-out estimation and maximum likelihood estimation to dynamically determine the optimal bandwidths in the kernel function for clustering. Experiments are conducted on real-world datasets and the results demonstrate that the proposed algorithm achieves higher clustering accuracy compared with the existing algorithms using a simple-matching distance measure or the attribute-weighting variants. The results also show that the bandwidth estimated by the proposed algorithm has practical significance in the applications, such as important feature identification.
Key wordsCategorical Data Clustering      Probability Model      Similarity Measure      Kernel Density Estimation(KDE)      Bandwidth Estimation     
Received: 30 September 2016     
ZTFLH: TP 311  
About author:: (ZHU Jie, born in 1971, senior engineer. His research interests include pattern recognition and target identification.)
(CHEN Lifei(Corresponding author), born in 1972, Ph.D., professor. His research interests include statistical machine learning, data mining and pattern recognition.)
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
ZHU Jie
CHEN Lifei
Cite this article:   
ZHU Jie,CHEN Lifei. Clustering Algorithm with Kernel Density Estimation[J]. , 2017, 30(5): 439-447.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201705006      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I5/439
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn