Abstract:The traditional clustering methods are inefficient due to high dimension and redundancy, small sample size and noise of the gene expression data. Subspace segmentation is an effective method for high dimensional data clustering. However, the performance of clustering is reduced by using subspace segmentation on the gene expression data directly. To cluster the gene expression data more effectively, low rank projection least square regression subspace segmentation method(LPLSR) is proposed. The improved low rank method is utilized to project gene expression data into the latent subspace to remove the possible corruptions in data and get a relatively clean data dictionary. Then, least square regression method is employed to obtain the low-dimension representation for data vectors and the affinity matrix is constructed to cluster the gene data. The experimental results on six public gene expression datasets show the validity of the proposed method.
[1] 黄德双.基因表达谱数据挖掘方法研究.北京:科学出版社, 2009. (HUANG D S. Study of Data Mining Methods in Gene Expression Analysis. Beijing, China: Science Press, 2009.) [2] JIANG D X, TANG C, ZHANG A D. Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1370-1386. [3] 王俊生,王 年,郭秀丽,等.基于Normalized Cut 的基因表达数据聚类.安徽大学学报(自然科学版), 2012, 36(4): 68-72. (WANG J S, WANG N, GUO X L, et al. Clustering of Gene Expression Data Based on Normalized Cut. Journal of Anhui University(Natural Science Edition), 2012, 36(4): 68-72.) [4] 沈宁敏,李 静,周培云,等.一种基于稀疏主成分的基因表达数据特征提取方法.计算机科学, 2015, 42(6A): 453-458. (SHEN N M, LI J, ZHOU P Y, et al. Feature Extraction Method Based on Sparse Principal Components for Gene Expression Data. Computer Science, 2015, 42(6A): 453-458.) [5] TAMAYO P, SLONIM D, MESIROV J, et al. Interpreting Patterns of Gene Expression with Self-organizing Maps: Methods and Application to Hematopoietic Differentiation. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(6): 2907-2912. [6] WANG J Y, WANG X L, GAO X. Non-negative Matrix Factorization by Maximizing Correntropy for Cancer Clustering. BMC Bioinformatics, 2013, 14(1): 107-117. [7] GAO Y, CHURCH G. Improving Molecular Cancer Class Discovery through Sparse Non-negative Matrix Factorization. Bioinformatics, 2005, 21(21): 3970-3975. [8] DING C, LI T, JORDAN M I. Convex and Semi-nonnegative Matrix Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(1): 45-55. [9] 闫雷鸣,孙志挥,吴英杰,等.联合聚类非线性相关的时序基因表达数据.计算机研究与发展, 2008, 45(11): 1865-1873. (YAN L M, SUN Z H, WU Y J, et al. Biclustering Nonlinearly Correlated Time Series Gene Expression Data. Journal of Computer Research and Development, 2008, 45(11): 1865-1873.) [10] ELHAMIFAR E, VIDAL R. Sparse Subspace Clustering // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2009: 2790-2797. [11] ELHAMIFAR E, VIDAL R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2765-2781. [12] LIU G C, LIN Z C, YU Y. Robust Subspace Segmentation by Low-Rank Representation[C/OL]. [2016-06-25]. http://icml2010.haifa.il.ibm.com/papers/521.pdf. [13] LIU G C, LIN Z C, YAN S C, et al. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Transactions on Pa-ttern Analysis and Machine Intelligence, 2013, 35(1): 171-184. [14] LU C Y, MIN H, ZHAO Z Q, et al. Robust and Efficient Subspace Segmentation via Least Squares Regression // Proc of the 12th European Conference on Computer Vision. Berlin, Germany: Springer, 2012: 347-360. [15] LIU G C, YAN S C. Latent Low-Rank Representation for Subspace Segmentation and Feature Extraction // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2011: 1615-1622. [16] LU C Y, FENG J S, LIN Z C, et al. Correlation Adaptive Subspace Segmentation by Trace Lasso // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2013: 1345-1352. [17] ZHANG Z, YAN S C, ZHAO M B, et al. Robust Bilinear Matrix Recovery by Tensor Low-Rank Representation // Proc of the International Joint Conference on Neural Networks. Washington, USA: IEEE, 2014: 2945-2951. [18] FENG J S, LIN Z C, XU H, et al. Robust Subspace Segmentation with Block-Diagonal Prior // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2014: 3818-3825. [19] HU H, LIN Z C, FENG J J, et al. Smooth Representation Clus-tering // Proc of the IEEE Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2014: 3834-3841. [20] VIDAL R. A Tutorial on Subspace Clustering. IEEE Signal Processing Magazine, 2010, 28(2): 52-68. [21] 简彩仁,陈晓云.基于投影最小二乘回归子空间分割的基因表达数据聚类.模式识别与人工智能, 2015, 28(8): 728-734. (JIAN C R, CHEN X Y. Gene Expression Data Clustering Based on Projection Least Square Regression Subspace Segmentation. Pa-ttern Recognition and Artificial Intelligence, 2015, 28(8): 728-734.) [22] CHEN X Y, JIAN C R. Gene Expression Data Clustering Based on Graph Regularized Subspace Segmentation. Neurocomputing, 2014, 143: 44-50. [23] BAO B K, LIU G C, XU C S, et al. Inductive Robust Principal Component Analysis. IEEE Transactions on Image Processing, 2012, 21(8): 3794-3800. [24] VON LUXBURG U. A Tutorial on Spectral Clustering. Statistics and Computing, 2007, 17(4): 395-416. [25] SHI J B, MALIK J. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905. [26] KANG Z, PENG C, CHENG Q. Robust Subspace Clustering via Tighter Rank Approximation // Proc of the 24th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2015: 393-401. [27] CAI D, HE X F, WU X Y, et al. Non-negative Matrix Factorization on Manifold // Proc of the 8th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2008: 63-72.