模式识别与人工智能
2025年4月5日 星期六   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2017, Vol. 30 Issue (5): 439-447    DOI: 10.16451/j.cnki.issn1003-6059.201705006
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
核密度估计的聚类算法*
朱杰1,陈黎飞2
1. 中国西南电子技术研究所 成都 610036
2.福建师范大学 数学与计算机科学学院 福州 350117
Clustering Algorithm with Kernel Density Estimation
ZHU Jie1, CHEN Lifei2
1.Southwest China Institute of Electronic Technology, Chengdu 610036
2. College of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350117

全文: PDF (589 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 相似性度量是聚类分析的重要基础,如何有效衡量类属型符号间的相似性是相似性度量的一个难点.文中根据离散符号的核概率密度衡量符号间的相似性,与传统的简单符号匹配及符号频度估计方法不同,该相似性度量在核函数带宽的作用下,不再依赖同一属性上符号间独立性假设.随后建立类属型数据的贝叶斯聚类模型,定义基于似然的类属型对象-簇间相似性度量,给出基于模型的聚类算法.采用留一估计和最大似然估计,提出3种求解方法在聚类过程中动态确定最优的核带宽.实验表明,相比使用特征加权或简单匹配距离的聚类算法,文中算法可以获得更高的聚类精度,估计的核函数带宽在重要特征识别等应用中具有实际意义.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
朱杰
陈黎飞
关键词 类属型数据聚类 概率模型 相似性度量 核密度估计(KDE) 带宽估计    
Abstract:Similarity measure is an important basis for clustering analysis. However, defining an efficient similarity measure for discrete symbols (categories) is difficult. In this paper, a method is proposed to measure the similarity between categories in terms of their kernel probability density. Different from the traditional simple-matching method or frequency-estimation method, under the action of the bandwidth for kernel functions, the proposed measure no longer depends on the assumption that categories on the same attribute are statistically independent. Then, a Bayesian clustering model is established based on kernel density estimation of categories, and a clustering algorithm is derived to optimize the clustering model using a likelihood-based object-to-cluster similarity measure. Finally, three data-driven approaches are proposed by leave-one-out estimation and maximum likelihood estimation to dynamically determine the optimal bandwidths in the kernel function for clustering. Experiments are conducted on real-world datasets and the results demonstrate that the proposed algorithm achieves higher clustering accuracy compared with the existing algorithms using a simple-matching distance measure or the attribute-weighting variants. The results also show that the bandwidth estimated by the proposed algorithm has practical significance in the applications, such as important feature identification.
Key wordsCategorical Data Clustering    Probability Model    Similarity Measure    Kernel Density Estimation(KDE)    Bandwidth Estimation   
收稿日期: 2016-09-30     
ZTFLH: TP 311  
基金资助:国家自然科学基金项目(No.61672157)、福建省自然科学基金项目(No.2015J01238)资助
作者简介: 朱 杰,男,1971年生,高级工程师,主要研究方向为模式识别、目标识别.E-mail:13348922176@163.com.
陈黎飞(通讯作者),男,1972年生,博士,教授,主要研究方向为统计机器学习、数据挖掘、模式识别.E-mail:clfei@fjnu.edu.cn.
引用本文:   
朱杰,陈黎飞. 核密度估计的聚类算法*[J]. 模式识别与人工智能, 2017, 30(5): 439-447. ZHU Jie, CHEN Lifei. Clustering Algorithm with Kernel Density Estimation. , 2017, 30(5): 439-447.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201705006      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2017/V30/I5/439
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn