Space Structure Based Affinity Propagation Algorithm for Categorical Data
WANG Qi, QIAN Yuhua, LI Feijiang
School of Computer and Information Technology, Shanxi University, Taiyuan 030006 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract Constructing a reasonable similarity measure is difficult due to the lack of clear space structure in categorical data. Therefore, numerical clustering algorithms can hardly be extended to categorical data clustering. In this paper, a representation method for transforming the categorical data into numerical data is introduced. The similarity between samples is reconstructured and the structure feature of the original categorical data is maintained in the reconstruction process. Based on the data representation method, the affinity propagation(AP) clustering algorithm is migrated to the categorical data clustering. A space structure based AP algorithm for categorical data(SBAP) is proposed. Experimental results on several categorical datasets from the UCI dataset show that the proposed method makes AP algorithm deal with the categorical data clustering problem effectively with a significant improvement in performance.
Fund:Supported by National Natural Science Foundation of China (No.61432011,U1435212,61322211), Program for New Century Excellent Talents in University of Ministry Education of China (No.NCET-12-1031), Specialized Research Fund for the Doctoral Program of Higher Education (No.20121401110013), Program for the Top Young Academic Leaders of Higher Learning Institutions of Shanxi Province (No.20120301)
About author:: (WANG Qi, born in 1979, Ph.D., lecturer. His research interests include data mining and knowledge discovery.) (QIAN Yuhua(Corresponding author), born in 1976, Ph.D., professor. His research interests include artificial intelligence, data mining and machine learning.) (LI Feijiang, born in 1990, Ph.D.candidate. His research interests include data mining and knowledge discovery.)
