Improved Biclustering Algorithm Based on Weighted Mean Square Residual
LIU Wenhua, LIANG Yongquan, FENG Zheng
College of Information Science and Engineering, Shandong University of Science and Technology,Qingdao 266590 Provincial Key Laboratory for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao 266590
Abstract:Existing biclustering algorithms can hardly discover biclusters with overlapping structures. Consequently, the correct bicluster structures hidden in gene expression data can not be effectively found. Moreover, the influence of the importance of the different conditions on the bicustering result is not taken into account in the process of adding and deleting conditions. An improved biclustering algorithm based on weighted mean square residual(IBWMSR) is proposed to overcome the above defects. The gene sets are firstly partitioned into initial biclusters by using fuzzy partition and the fuzzy partition is controlled by overlapping ratio and the membership of the genes. Then, the weights of the conditions in each bicluster are iteratively updated in the process of minimizing the objective function. Finally, the bicluster set is obtained after adding the genes satisfying the constraints and removing the genes producing inconsistency fluctuation. Theexperiment shows that the proposed algorithm generates the biclusters with similar expression level of different sizes and restricts the overlapping ratio to a reasonable range.
[1] CHENG Y, CHURCH G M. Biclustering of Expression Data // Proc of the 8th International Conference on Intelligent Systems for Molecular Biology. San Diego, USA: AAAI, 2000, VIII: 93-103. [2] YANG J, WANG H X, WANG W, et al. Enhanced Biclustering on Expression Data // Proc of the 3rd IEEE Symposium on Bioinformatics and Bioengineering. Bethesda,USA: IEEE, 2003: 321-327. [3] 胡 云,苗夺谦,王睿智,等.一种基于粗糙 k 均值的双聚类算法.计算机科学, 2007, 34(11): 174-177. (HU Y, MIAO D Q, WANG R Z, et al. A Biclustering Algorithm Based on Rough K-means. Computer Science, 2007, 34(11): 174-177.) [4] 朱 娴,马 卫.一种基于层次聚类的双聚类算法.微计算机应用, 2009, 30(5): 12-17. (ZHU X, MA W. A Biclstering Algorithm Based on Hierarchical Clustering. Microcomputer Applications , 2009, 30(5): 12-17.) [5] ZHANG Z H, TEO A, OOI B C, et al. Mining Deterministic Biclusters in Gene Expression Data // Proc of the 4th IEEE Sympo-sium on Bioinformatics and Bioengineering. Taizhong, China: IEEE, 2004: 283-290. [6] DAMELIN S B, GU Y, WUNSCH II D C, et al. Fuzzy Adaptive Resonance Theory, Diffusion Maps and Their Applications to Clus-tering and Biclustering. Mathematical Modelling of Natural Phenomena, 2014, 10(3): 206-211. [7] TANAY A, SHARAN R, SHAMIR R. Discovering Statistically Significant Biclusters in Gene Expression Data. Bioinformatics, 2002, 18(S1): 136-144. [8] CHAKRABORTY A. Biclustering of Gene Expression Data by Simulated Annealing // Proc of the 8th International Conference on High-Performance Computing in Asia-Pacific Region. Beijing, China: IEEE, 2005: 627-632. [9] WOLF T, BRORS B, HOFMANN T, et al. Global Biclustering of Microarray Data // Proc of the 6th IEEE International Conference on Data Mining. Hong Kong, China: IEEE, 2006: 125-129. [10] 朱 娴,许建华.基于模拟退火粒子群优化的基因数据双聚类算法.计算机与应用化学, 2013, 30(1): 93-96. (ZHU X, XU J H. A Gene Data Biclustering Algorithm Based on Simulated Annealing Particle Swarm Optimization. Computers and Applied Chemistry, 2013, 30(1): 93-96.) [11] HANCZAR B, NADIF M. Using the Bagging Approach for Biclu-stering of Gene Expression Data. Neurocomputing, 2011, 74(10): 1595-1605. [12] AGUILAR-RUIZ J S. Shifting and Scaling Patterns from Gene Expression Data. Bioinformatics, 2005, 21(20): 3840-3845. [13] TENG L, CHAN L W. Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data. Journal of Signal Processing Systems, 2008, 50(3): 267-280. [14] AYADI W, ELLOUMI M, HAO J K.A Biclustering Algorithm Based on a Bicluster Enumeration Tree:Application to DNA Microarray Data. BioData Mining, 2009, 2(2): 146-150. [15] AYADI W, ELLOUMI M, HAO J K. BicFinder: A Biclustering Algorithm for Microarray Data Analysis. Knowledge and Information Systems, 2012, 30(2): 341-358. [16] GUPTA N, AGGARWAL S. MIB: Using Mutual Information for Biclustering Gene Expression Data. Pattern Recognition, 2010, 43(8): 2692-2697. [17] MITRA S, BANKA H. Multi-objective Evolutionary Biclustering of Gene Expression Data. Pattern Recognition, 2006, 39(12): 2464-2477. [18] 张 敏,戈文航.基于概率计算的重叠双聚类算法.计算机工程与设计, 2012, 33(9): 3579-3583. (ZHANG M, GE W H. Overlap Bicluster Algorithm Based on Probability.Computer Engineering and Design, 2012, 33(9): 3579-3583.) [19] FILIPPONE M, MASULLI F, ROVETTA S, et al. Possibilistic Approach to Biclustering: An Application to Oligonucleotide Microarray Data Analysis // Proc of the International Conference on Computational Methods in Systems Biology. Berlin, Germany: Springer-Verlag, 2006: 312-322.