Protein Secondary Structure Prediction Based on Convolutional Long Short-Time Memory Neural Networks
GUO Yanbu1, LI Weihua1, WANG Bingyi2, JIN Chen1
1.School of Information Science and Engineering, Yunnan University, Kunming 650500
2.The Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming 650224
Since the interaction of different types of amino acid has an influence on the prediction of protein structure, convolutional neural networks and long short-term memory neural networks are integrated. A convolutional long short-term memory neural network is proposed to predict 8-class protein secondary structures. Firstly, the protein sequence is represented based on the amino acid sequence class feature and the amino acid structure profile feature. The local correlation characteristics between amino acid residues are extracted by the convolutional operations, and then the long-range interactions between the residues on protein sequences are extracted by the bi-directional long short-term memory network. Finally, the local correlation characteristics and long-range interactions between amino acid residues are employed to predict protein secondary structures. Experimental results show that the proposed model achieves a higher accuracy than the baselines and the framework has good scalability.
郭延哺, 李维华, 王兵益, 金宸. 基于卷积长短时记忆神经网络的蛋白质二级结构预测[J]. 模式识别与人工智能, 2018, 31(6): 562-568.
GUO Yanbu, LI Weihua, WANG Bingyi, JIN Chen. Protein Secondary Structure Prediction Based on Convolutional Long Short-Time Memory Neural Networks. , 2018, 31(6): 562-568.
[1] 张海仓,高玉娟,邓明华,等.蛋白质中残基远程相互作用预测算法研究综述.计算机研究与发展, 2017, 54(1): 1-19.
(ZHANG H C, GAO Y J, DENG M H, et al. A Survey on Algorithms for Protein Contact Prediction. Journal of Computer Research and Development, 2017, 54 (1): 1-19.)
[2] 张燕平,查永亮,赵 姝,等.基于自相关系数和PseAAC的蛋白质结构类预测.计算机科学与探索, 2014, 8(1): 103-110.
(ZHANG Y P, ZHA Y L, ZHAO S, et al. Protein Structure Class Prediction Based on Autocorrelation Coefficient and PseAAC. Journal of Frontiers of Computer Science and Technology, 2014, 8(1): 103-110.)
[3] 李玉岗,张 法,刘志勇.结合位点进化距离与支持向量机的蛋白质分类方法.计算机学报, 2008, 31(1): 43-50.
(LI Y G, ZHANG F, LIU Z Y. Combining Position-Specific-Value Method and SVM for Remote Protein Classification. Chinese Journal of Computers, 2008, 31(1): 43-50.)
[4] 韩 跃,冀俊忠,杨翠翠.基于多标签传播机制的蛋白质相互作用网络功能模块检测.模式识别与人工智能, 2016, 29(6): 548-557.
(HAN Y, JI J Z, YANG C C. Functional Module Detection Based on Multi-label Propagation Mechanism in Protein-Protein Interaction Networks. Pattern Recognition and Artificial Intelligence, 2016, 29(6): 548-557.)
[5] CHENG J L, TEGGE A N, BALDI P. Machine Learning Methods for Protein Structure Prediction. IEEE Reviews in Biomedical Engineering, 2008, 1: 41-49.
[6] KANNAN D, DIABAT A, ALREFAEI M, et al. A Carbon Footprint Based Reverse Logistics Network Design Model. Resources, Conservation and Recycling, 2012, 67: 75-79.
[7] HUA S J, SUN Z R. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure Support Vector Machine Approach. Journal of Molecular Biology, 2001, 308(2): 397-407.
[8] LI Z, YU Y Z. Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks // Proc of the 25th International Joint Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2016: 2560-2567.
[9] WANG S, PENG J, MA J Z, et al. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Scientific Reports, 2016. DOI: 10.1038/srep18962.
[10] ROST B, SANDER C. Prediction of Protein Secondary Structure at Better Than 70% Accuracy. Journal of Molecular Biology, 1993, 232(2): 584-599.
[11] ZHOU J, TROYANSKAYA O G. Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction // Proc of the 31st International Conference on Machine Learning. New York, USA: ACM, 2014: 745-753.
[12] JONES D T. Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. Journal of Molecular Biology, 1999, 292(2): 195-202.
[13] QIAN N, SEJNOWSKI T J. Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Journal of Mole-cular Biology, 1988, 202(4): 865-884.
[14] POLLASTRI G, PRZYBYLSKI D, ROST B, et al. Improving the Prediction of Protein Secondary Structure in Three and Eight Cla-sses Using Recurrent Neural Networks and Profiles. Proteins, 2002, 47(2): 228-235.
[15] KABSCH W, SANDER C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers, 1983, 22(12): 2577-2637.
[16] WANG Z Y, ZHAO F, PENG J, et al. Protein 8-Class Secondary Structure Prediction Using Conditional Neural Fields // Proc of the IEEE International Conference on Bioinformatics and Biomedicine. Washington, USA: IEEE, 2010: 109-114.
[17] SO/NDERBY S K, WINTHER O. Protein Secondary Structure Prediction with Long Short Term Memory Networks[C/OL]. [2017-11-25]. https://arxiv.org/pdf/1412.7828v2.pdf.
[18] BUSIA A, COLLINS J, JAITLY N. Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning[C/OL]. [2017-11-25]. https://arxiv.org/pdf/1611.01503v1.pdf.
[19] 吕永标,赵建伟,曹飞龙.基于复合卷积神经网络的图像去噪算法.模式识别与人工智能, 2017, 30(2): 97-105.
(LÜ Y B, ZHAO J W, CAO F L. Image Denoising Algorithm Based on Composite Convolutional Neural Network. Pattern Recognition and Artificial Intelligence, 2017, 30(2): 97-105.)
[20] BENGIO Y. Deep Learning of Representations: Looking Forward[C/OL]. [2017-11-25]. https://arxiv.org/pdf/1305.0445.pdf.
[21] 阮晓钢,孙海军.编码方式对蛋白质二级结构预测精度的影响.北京工业大学学报, 2005, 31(3): 229-235.
(RUAN X G, SUN H J. Research on Encode Influencing Protein Secondary Structure Prediction. Journal of Beijing University of Technology, 2005, 31(3): 229-235.)
[22] ALTSCHUL S F, MADDEN T L, SCHÄFFER A A, et al. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 1997, 25(17): 3389-3402.
[23] BUSIA A, JAITLY N. Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction[C/OL]. [2017-11-25]. https://arxiv.org/pdf/1702.03865.pdf.
[24] ASGARI E, MOFRAD M R K. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PlOS One, 2015. DOI: 10.1371/journal.pone.0141287.
[25] KINGMA D P, BA J. ADAM: A Method for Stochastic Optimization[C/OL]. [2017-11-25]. https://arxiv.org/pdf/1412.6980.pdf.