|
|
Incomplete Data Imputation Clustering Based on Difference of Convex Functions Programming |
HE Dan, CHEN Songcan |
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016 |
|
|
Abstract To improve the clustering performance, an incomplete data imputation clustering algorithm based on difference of convex functions programming (DCP) is proposed. DCP is applied to optimize the kernel-based fuzzy C-means objective function, and the alternative optimization process for DCP clustering and missing completion is given. The convergence of the alternating optimization is proved theoretically. Experiments show the superiority of the proposed algorithm in missing completion and clustering performance.
|
Received: 10 September 2016
|
|
About author:: HE Dan, born in 1992, master student. Her research interests include pattern recognition. CHEN Songcan(Corresponding author), born in 1962, Ph.D., professor. His research interests include pattern recognition, machine learning and intelligent computing. |
|
|
|
[1] LITTLE R J A, RUBIN D B. Statistical Analysis with Missing Data. New York, USA: John Wiley & Sons, 2014. [2] ABRAHAM W T, RUSSELL D W. Missing Data: A Review of Cu-rrent Methods and Applications in Epidemiological Research. Cu-rrent Opinion in Psychiatry, 2004, 17(4): 315-321. [3] FREY B J, DUECK D. Clustering by Passing Messages between Data Points. Science, 2007, 315(5814): 972-976. [4] LU C, SONG S J, WU C. Affinity Propagation Clustering with Incomplete Data // Proc of the International Conference of Life System Modeling and Simulation and International Conference on Intelligent Computing for Sustainable Energy and Environment. Berlin, Germany: Springer, 2014: 239-248. [5] DATTA S, BHATTACHARJEE S, DAS S. Clustering with Missing Features: A Penalized Dissimilarity Measure Based Approach [J/OL]. [2016-08-20]. https://arxiv.org/pdf/1604.06602v2.pdf. [6] 陈景年.选择性贝叶斯分类算法研究.博士学位论文.北京:北京交通大学, 2008. (CHEN J N. Research on Selective Bayesian Classifiers. Ph.D Dissertation. Beijing, China: Beijing Jiaotong University, 2008.) [7] BROWN M L, KROS J F. Data Mining and the Impact of Missing Data. Industrial Management & Data Systems, 2003, 103(8): 611-621. [8] JNSSON P, WOHLIN C. An Evaluation of k-Nearest Neighbour Imputation Using Likert Data // Proc of the 10th International Symposium on Software Metrics. Washington, USA: IEEE, 2004: 108-118. [9] LAURITZEN S L. The EM Algorithm for Graphical Association Models with Missing Data. Computational Statistics & Data Analysis, 1995, 19(2): 191-201. [10] HATHAWAY R J, BEZDEK J C. Fuzzy C-means Clustering of Incomplete Data. IEEE Transactions on Systems, Man, and Cybernetics(Cybernetics), 2001, 31(5): 735-744. [11] ZHANG D Q, CHEN S C. Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm. Neural Processing Letters, 2003, 18(3): 155-162. [12] ZHANG D Q, CHEN S C. A Novel Kernelized Fuzzy C-means Algorithm with Application in Medical Image Segmentation. Neuroscience & Behavioral Physiology, 2004, 32(1): 37-50. [13] TAO P D, AN L T H. Convex Analysis Approach to DC Progra- mming: Theory, Algorithms and Applications. Acta Mathematica Vietnamica, 1997, 22(1): 289-355. [14] ROCKAFELLAR R T. Convex Analysis. Princeton, USA: Princeton University Press, 2015. [15] AN L T H, TAO P D. Solving a Class of Linearly Constrained Indefinite Quadratic Problems by DC Algorithms. Journal of Global Optimization, 1997, 11(3): 253-285. [16] AN N T, NAM N M, YEN N D. Convergence Analysis of a Proximal Point Algorithm for Minimizing Differences of Convex Functions [J/OL]. [2016-08-20]. https://arxiv.org/pdf/1504.08079v4.pdf. [17] 张道强.基于核的联想记忆及聚类算法的研究与应用.博士学位论文.南京:南京航空航天大学, 2004. (ZHANG D Q. Kernel-Based Associative Memories, Clustering Algorithms and Their Applications. Ph.D Dissertation. Nanjing, China: Nanjing University of Aeronautics and Astronautics, 2004.) [18] BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York, USA: Springer, 1981. [19] 贺 丹,陈松灿.基于DC规划的鲁棒模糊核聚类算法.模式识别与人工智能, 2016, 29(8): 744-750. (HE D, CHEN S C. Robust Kernel-Based Fuzzy Clustering Using Difference of Convex Functions Programming. Pattern Recognition and Artificial Intelligence, 2016, 29(8): 744-750.) [20] HATHAWAY R J, HU Y K, BEZDEK J C. Local Convergence of Tri-level Alternating Optimization. Neural, Parallel and Scientific Computations, 2001, 9(1): 19-28. [21] HATHAWAY R J, BEZDEK J Z. Optimization of Clustering Criteria by Reformulation. IEEE Transactions on Fuzzy Systems, 1995, 3(2): 241-245. [22] LI D, ZHONG C Q, LI J H. An Attribute Weighted Fuzzy C-means Algorithm for Incomplete Data Sets. Journal of Dalian University of Technology, 2012, 52(5): 749-754. |
|
|
|