Low-Rank Approximation and Decomposition for Kernel Matrix Based on Column Correlation
LIU Song-Hua1, ZHANG Jun-Ying1 , DING Cai-Ying2,3
1.School of Computer Science and Engineering, Xidian University, Xi’an 710071 2.Center of Interdisciplinary Studies, Lanzhou University, Lanzhou 730000 3.Laboratory of Condensed Matter Theory and Materials Computation, Institute of Physics, Chinese Academy of Sciences, Beijing 100170
Abstract:An effective method of low-rank approximation and decomposition for kernel matrix is proposed . Firstly, aiming at the assumption that column of the kernel matrix is independent from its class label, the correlation of columns is studied and a strategy for column selection is designed. Secondly, the kernel matrix is decomposed into two stages: low-rank matrix decomposition and extension. Then an expectation of low-rank approximation error bound is given. The proposed algorithm extracts discriminative sub-matrix without independent assumption. In this way, it avoids the decomposition of the entire kernel matrix and effectively reduces the computational complexity. Finally, the experimental results show that the proposed method is effective and reasonable.
[1] Schlkopf B, Smola A, Muller K R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 1998, 10(5): 1299-1319 [2] Mika S, Ratsch G, Weston J, et al. Fisher Discriminant Analysis with Kernels // Proc of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing. Madison, USA, 1999, IX: 41-48 [3] Williams C K I, Seeger M. Using the Nystrm Method to Speed up Kernel Machines // Leen T K, Dietterich T G, Tresp V, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2000, XIII: 682-688 [4] Teixeira A R, Tomé A M, Lang E W. Feature Extraction Using Low-Rank Approximations of the Kernel Matrix // Proc of the 5th International Conference on Image Analysis and Recognition. Póvoa de Varzim, Portugal, 2008: 404-412 [5] Drineas P, Kannan R, Mahoney M W. Fast Monte Carlo Algorithm for Matrices II: Computing a Low-Rank Approximation to a Matrix. SIAM Journal of Computing, 2006, 36(1): 158-183 [6] Drineas P, Mahoney M W. On the Nystrm Method for Approximating a Gram Matrix for Improved Kernel-Based Learning. Journal of Machine Learning Research, 2005, 6: 2153-2175 [7] Belabbas M A, Wolfe P J. Spectral Methods in Machine Learning and New Strategies for Very Large Data Sets. Proc of the National Academy of Science, 2009, 106(2): 369-374 [8] Boutsidis C, Mahoney M W, Drineas P. An Improved Approximation Algorithm for the Column Subset Problem // Proc of the 20th Annual ACM-SIAM Symposium on Discrete Algorithm. New York, USA, 2009: 968-977 [9] Ye Qiaolin, Ye Ning, Zhang Xunhua. Extremum Decomposition Based Mixtures of Kernels and Its Improvement. Pattern Recognition and Artificial Intelligence, 2009, 22(3): 366-373 (in Chinese) (业巧林,业 宁,张训华.基于极分解下的混合核函数及改进.模式识别与人工智能, 2009, 22(3): 366-373) [10] Smola A J, Schlkopf B. Sparse Greedy Matrix Approximation for Machine Learning // Proc of the 17th International Conference on Machine Learning. Stanford, USA, 2000: 911-918 [11] Fine S, Scheinberg K. Efficient SVM Training Using Low-Rank Kernel Representation. Journal of Machine Learning Research, 2001, 2: 243-264 [12] Lanckrite G R G, Cristianini N, Bartlett P, et al. Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research, 2004, 5: 27-72 [13] Bach F R, Jordan M I.Predictive Low-Rank Decomposition for Kernel Methods // Proc of the 22nd International Conference on Machine Learning. Bonn, Germany, 2005: 33-40 [14] Kulis B, Sustik M A, Dhillon I S. Low-Rank Kernel Learning with Bregman Matrix Divergences. Journal of Machine Learning Research, 2009,10: 341-376 [15] Peng H C, Long F H, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238 [16] Estévez P A, Tesmer M, Perez C A, et al. Normalized Mutual Information Feature Selection. IEEE Trans on Neural Networks, 2009, 20(2): 189-201 [17] Torkkola K. Feature Extraction by Non-Parametric Mutual Information Maximization. Journal of Machine Learning Research, 2003, 3: 1415-1438 [18] Botev Z I, Grotowski J F, Kroese D P. Kernel Density Estimation via Diffusion. Annals of Statistics, 2010, 38(5): 2916-2957 [19] Golub G H, van Loan C F. Matrix Computation. 3rd Edition. Baltimore, USA: The Johns Hopkins University Press, 1996 [20] Fowlkes C, Belongie S, Chung F, et al. Spectral Grouping Using the Nystrm Method. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214-225 [22] UCI Machine Learning Repository[DB/OL]. http://archive.ics.uci.edu/ml/