Approach and Techniques for Precise Prediction of N-Linked Glycosylation from Human Protein using Artificial Intelligence
Approach and Techniques for Precise Prediction of N-Linked Glycosylation from Human Protein using Artificial Intelligence
|© 2022 by IJETT Journal|
|Year of Publication : 2022|
|Author : Mubina Malik, Jaimin N Undavia
|DOI : 10.14445/22315381/IJETT-V70I12P213|
How to Cite?
Mubina Malik, Jaimin N Undavia, "Approach and Techniques for Precise Prediction of N-Linked Glycosylation from Human Protein using Artificial Intelligence," International Journal of Engineering Trends and Technology, vol. 70, no. 12, pp. 118-126, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I12P213
Glycosylation is the most common post-translational modification of protein in all territories, which plays a significant role in biological processes. Amongst them, n-linked glycosylation is the most crucial modification, which is closely related to certain diseases such as cancer, diabetes, HIV infection, Alzheimer's disease and atherosclerosis, and liver cirrhosis. Recent advancements in biological knowledge are depicted in this article, ultimately targeting the computer science field. Machine learning and deep learning techniques are major keys to predicting various protein modifications. Through the review of several models which have been made existing for prediction and show high accuracy but result as false positives due to the poor biological knowledge, updated datasets and techniques used. Targeting precise prediction, drawbacks of the existing model and discussed parameters and techniques were emphasized to model solution in this paper. In this study, databases were combined, namely UniprotKB, dbPTM, and nGlycositeAtlas, which are experimentally verified and updated with window size 21. This window size is best for the n-linked glycosylation. After combining datasets and removing the redundancy, 11254 unique proteins and 33859 glycosites were received for further study. CD-HIT algorithm was implemented to remove the redundancy with threshold 0.9. These nearby locations for similar pattern sequences have been identified for asparagine residue for n-linked glycosylation. The protein sequence is a combination of 20 amino acids, which were required to convert into numerical form through encoding methods. Various encoding methods have conversed for n-linked glycosylation. With the biological features, amino acid encoding methods such as substitution matrices - Position Specific Scoring Matrix (PSSM) and Physicochemical properties encoding VHSE8 are the vital methods which improve the accuracy in n-linked glycosylation prediction.
Artificial intelligence, Deep learning, Human protein, Machine learning, N-linked glycosylation.
 Kelley W. Moremen, Michael Tiemeyer, and Alison V. Nairn, "Vertebrate Protein Glycosylation: Diversity, Synthesis and Function," Nature Reviews Molecular Cell Biology, vol. 13, no. 7, pp. 448–462, 2012. Crossref, https://doi.org/10.1038/nrm3383
 Ząbczyńska M, and Pochec E., “The Role of Protein Glycosylation in Immune System,” Postepy Biochem, vol. 61, no. 2, pp. 129-137, 2015.
 Varki A et al., editors.Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press, 2009.
 John F. Rakus, and Lara K. Mahal, "New Technologies for Glycomic Analysis: Toward A Systematic Understanding of the Glycome," Annual Review of Analytical Chemistry (Palo Alto Calif), pp. 367-92, 2011. Crossref, https://doi.org/10.1146/annurev-anchem-061010-113951
 Celso A Reis, Rudolf Tauber, and Véronique Blanchard, "Glycosylation is a Key in SARS-CoV-2 Infection," Journal of Molecular Medicine, vol. 99, no. 8, pp. 1023–1031, 2021. Crossref, https://doi.org/10.1007/s00109-021-02092-0
 Gerald W Hart, and Ronald J Copeland, "Glycomics Hits the Big Time," Cell, vol. 143, no. 5, pp. 672-676, 2010. Crossref, https://doi.org/10.1016/j.cell.2010.11.008
 Karin Julenius et al., "Prediction, Conservation Analysis, and Structural Characterization of Mammalian Mucin-Type O-Glycosylation Sites," Glycobiology, vol. 15, no. 2, pp. 153-164, 2005. Crossref, https://doi.org/10.1093/glycob/cwh151
 Radjiv Goulabchand et al., "Impact of Autoantibody Glycosylation in Autoimmune Diseases," Autoimmunity Reviews, vol. 13, no.7, pp. 742–750, 2014. Crossref, https://doi.org/10.1016/j.autrev.2014.02.005
 Manish Suyal, and Parul Goyal, "A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms Based on Supervised Learning," International Journal of Engineering Trends and Technology, vol. 70, no. 7, pp. 43-48, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I7P205
 Kai-Yao Huang et al., "dbPTM in 2019: Exploring Disease Association and Cross-Talk of Post-Translational Modifications," Nucleic Acids Research, vol. 47, no. D1, pp. D298-D308, 2019. Crossref, https://doi.org/10.1093/nar/gky1074
 Kazuaki Ohtsubo, and Jamey D Marth, "Glycosylation in Cellular Mechanisms of Health and Disease," Cell, vol. 126, no. 5, pp. 855- 867, 2006. Crossref, https://doi.org/10.1016/j.cell.2006.08.019
 Nikolaj Blom et al., "Prediction of Post-Translational Glycosylation and Phosphorylation of Proteins from the Amino Acid Sequence," Proteomics, vol. 4, no. 6, pp. 1633-1649, 2004. Crossref, https://doi.org/10.1002/pmic.200300771
 Y Gavel, and G von Heijne, "Sequence Differences Between Glycosylated and Non-Glycosylated Asn-X-Thr/Ser Acceptor Sites: Implications for Protein Engineering," Protein Engineering, vol. 3, no. 5, pp. 433-442, 1990. Crossref, https://doi.org/10.1093/protein/3.5.433
 Birgit Eisenhaber, and Frank Eisenhaber, "Prediction of Post-Translational Modification of Proteins from their Amino Acid Sequence," Methods in Molecular Biology (Clifton, N.J.), vol. 609, pp. 365-384, 2010. Crossref, https://doi.org/10.1007/978-1-60327-241-4_21
 Manikandan Muthu et al., "Insights into Bioinformatic Applications for Glycosylation: Instigating an Awakening towards Applying Glycoinformatic Resources for Cancer Diagnosis and Therapy," International Journal of Molecular Sciences, vol. 21, no. 24, p. 9336, 2020. Crossref, https://doi.org/10.3390/ijms21249336
 Ching-Hsuan Chien et al., "N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy," IEEE Access, vol. 8, pp. 165944-165950, 2020. Crossref, https://doi.org/10.1109/ACCESS.2020.3022629
 Thejkiran Pitti et al., "N-Glyde: A Two-Stage N-Linked Glycosylation Site Prediction Incorporating Gapped Dipeptides and PatternBased Encoding," Scientific Reports, vol. 9, no. 1, p. 15975, 2019. Crossref, https://doi.org/10.1038/s41598-019-52341-z
 Subash C. Pakhrin et al., "DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction," Molecules, vol. 26, no. 23, pp. 7314, 2021. Crossref, https://doi.org/10.3390/molecules26237314
 Tian Jipeng, Suma P, and Dr. T.C.Manjunath, "AI, ML and the Eye Disease Detection," SSRG International Journal of Computer Science and Engineering, vol. 7, no. 4, pp. 1-3, 2020. Crossref, https://doi.org/10.14445/23488387/IJCSE-V7I4P101
 Pablo Minguez et al., "PTMcode: A Database of Known and Predicted Functional Associations Between Post-Translational Modifications in Proteins," Nucleic Acids Research, vol. 41, pp. 306-311, 2013. Crossref, https://doi.org/10.1093/nar/gks1230
 Zhongyan Li et al., "dbptm in 2022: An Updated Database for Exploring Regulatory Networks And Functional Associations of Protein Post-Translational Modifications,” Nucleic Acids Research, vol. 50, no. D1, pp. 471–479, 2022. Crossref, https://doi.org/10.1093/nar/gkab1017
 Bingjie Xue et al., "KinPred: A Unified and Sustainable Approach for Harnessing Proteome-Level Human Kinase-Substrate Predictions," PLoS Computational Biology, vol. 17, no. 2, 2021. Crossref, https://doi.org/10.1371/journal.pcbi.1008681
 Alex S Holehouse, and Kristen M Naegle, "Reproducible Analysis of Post-Translational Modifications in Proteomes--Application to Human Mutations," PLoS One, vol. 10, no. 12, 2015. Crossref, https://doi.org/10.1371/journal.pone.0144692
 Sachin Gavali et al., "RESTful API for iPTMnet: A Resource for Protein Post-Translational Modification Network Discovery," Database: The journal of Biological Databases and Curatio, vol. 2020, 2020. Crossref, https://doi.org/10.1093/database/baz157
 Dan Ofer, Nadav Brandes, and Michal Linial., "The Language of Proteins: NLP, Machine Learning & Protein Sequences," Computational and Structural Biotechnology Journal, vol. 19, pp. 1750-1758, 2021. Crossref, https://doi.org/10.1016/j.csbj.2021.03.022
 Mihaly Varadi et al., "AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models," Nucleic Acids Research, vol. 50, no. D1, pp. 439–444, 2022. Crossref, https://doi.org/10.1093/nar/gkab1061
 Gupta R, and Brunak S., "Prediction of Glycosylation Across the Human Proteome and the Correlation to Protein Function," Pacific Symposium on Biocomputing, Pacific Symposium on Biocomputing, pp. 310-322, 2002.
 Stephen E Hamby, and Jonathan D Hirst, "Prediction of Glycosylation Sites Using Random Forests," BMC Bioinformatics, vol. 9, p. 500, 2008. Crossref, https://doi.org/10.1186/1471-2105-9-500
 Cornelia Caragea et al., "Glycosylation Site Prediction Using Ensembles of Support Vector Machine Classifiers," BMC Bioinformatics, vol. 8, pp. 438, 2007. Crossref, https://doi.org/10.1186/1471-2105-8-438
 Chauhan JS et al., "GlycoPP: A Web Server for Prediction of N- and O-Glycosites in Prokaryotic Protein Sequences," PLoS One, vol. 7, no. 7, 2012.
 Jagat Singh Chauhan, Alka Rao, and Gajendra P. S. Raghava, "In Silico Platform for the Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences," Plos One, vol. 8, 2013. Crossref, https://doi.org/10.1371/journal.pone.0067008
 Fuyi Li et al., "Glycomine: A Machine Learning-Based Approach for Predicting N-, C- and O-Linked Glycosylation in the Human Proteome," Bioinformatics, vol. 31, no. 9, pp. 1411–1419, 2015. Crossref, https://doi.org/10.1093/bioinformatics/btu852
 Ghazaleh Taherzadeh et al., "SPRINT-Gly: Predicting N- and O-Linked Glycosylation Sites of Human and Mouse Proteins by Using Sequence and Predicted Structural Properties," Bioinformatics, vol. 35, no. 20, pp. 4140-4146, 2019. Crossref, https://doi.org/10.1093/bioinformatics/btz215
 Kolapo Adetomiwa, "Adoption And Utilization of Artificial Intelligence (Ai) In Poultry Production: Evidence From Smart Agricultural Practices in Nigeria," SSRG International Journal of Agriculture & Environmental Science, vol. 7, no. 3, pp. 46-54, 2020. Crossref, https://doi.org/10.14445/23942568/IJAES-V7I3P106
 Fuyi Li et al., "GlycoMine(struct): A New Bioinformatics Tool for Highly Accurate Mapping of the Human N-Linked and O-Linked Glycoproteomes by Incorporating Structural Features," Scientific Reports, vol. 6, 2016. Crossref, https://doi.org/10.1038/srep34595
 Benjamin Luke Schulz, "Beyond the Sequon: Sites of N-Glycosylation," Glycosylation, Petrescu, S., Ed., InTech: Rijeka, Croatia, pp. 21–40, 2012. Crossref, https://doi.org/10.5772/50260
 Mihai Nita-Lazar et al., "The N-X-S/T Consensus Sequence is Required But not Sufficient for Bacterial N-Linked Protein Glycosylation," Glycobiology, vol. 15, no. 4, pp. 361–367, 2005. Crossref, https://doi.org/10.1093/glycob/cwi019
 Mubina Malik, and Jaimin N Undavia, “Trials, Skills, and Future Standpoints of AI-Based Research in Bioinformatics," International Journal of Recent Technology and Engineering, vol. 9, no. 1, pp. 968–972, 2020. Crossref, https://doi.org/10.35940/ijrte.A1920.059120
 Alhasan Alkuhlani et al., "Intelligent Techniques Analysis for Glycosylation Site Prediction,” Current Bioinformatics, vol. 16, no. 6, pp. 774-788, 2021. Crossref, https://doi.org/10.2174/1574893615666210108094847
 Shisheng Sun et al., “N-GlycositeAtlas: A Database Resource for Mass Spectrometry-Based Human N-Linked Glycoprotein and Glycosylation Site Mapping," Clinical Proteomics, vol. 16, no. 35, pp. 1-11, 2019. Crossref, https://doi.org/10.1186/s12014-019-9254- 0
 The UniProt Consortium, "UniProt: The Universal Protein Knowledgebase in 2021," Nucleic Acids Research, vol. 49, no. D1, pp. D480–D489, 2021. Crossref, https://doi.org/10.1093/nar/gkaa1100
 Shuichi Kawashima, and Minoru Kanehisa, “Aaindex: Amino Acid Index Database," Nucleic Acids Research, vol. 27, no. 1, pp. 368- 369, 1999. Crossref, https://doi.org/10.1093/nar/27.1.368
 Ke Chen, Lukasz Kurgan, and Jishou Ruan, "Optimization of the Sliding Window Size for Protein Structure Prediction," 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 1-7, 2006. Crossref, https://doi.org/10.1109/CIBCB.2006.330959
 Vedant Bhatt, and Mohammad Makki, "Artificial Intelligence for Curing Skin Disorders," SSRG International Journal of Computer Science and Engineering, vol. 5, no. 10, pp. 7-9, 2018. Crossref, https://doi.org/10.14445/23488387/IJCSE-V5I10P103
 Limin Fu et al., "CD-HIT: Accelerated for Clustering the Next-Generation Sequencing Data," Bioinformatics, vol. 28, no. 23, pp. 3150-3152, 2012. Crossref, https://doi.org/10.1093/bioinformatics/bts565
 Xiaoyang Jing et al., “Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 6, pp. 1918–1931, 2020. Crossref, https://doi.org/10.1109/TCBB.2019.2911677
 Hesham ElAbd et al., "Amino Acid Encoding for Deep Learning Applications," BMC Bioinformatics, vol. 21, no. 235, pp. 1-14, 2020. Crossref, https://doi.org/10.1186/s12859-020-03546-x
 J. T. L. Wang et al., "New Techniques for Extracting Features from Protein Sequences,” IBM Systems Journal, vol. 40, no. 2, pp. 426– 441, 2001. Crossref, https://doi.org/10.1147/sj.402.0426
 Gilbert White, and William Seffens, "Using a Neural Network to Back Translate Amino Acid Sequences," Electronic Journal of Biotechnoloy, vol. 1, no. 3, pp. 17–18, 1998.
 Michael Beckstette et al., “Fast Index Based Algorithms and Software for Matching Position-Specific Scoring Matrices,” BMC Bioinformatics, vol. 7, no. 389, 2006. Crossref, https://doi.org/10.1186/1471-2105-7-389
 Matthew J. Betts, and Robert B. Russell, "Amino Acid Properties and Consequences of Substitutions," Bioinformatics for Geneticists, vol. 317, no. 289, 2003. Crossref, https://doi.org/10.1002/0470867302.ch14
 Stephen F. Altschul et al., "Gapped BLAST And PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. Crossref, https://doi.org/10.1093/nar/25.17.3389
 Pablo Minguez et al., "PTMcode v2: A Resource for Functional Associations of Post-Translational Modifications within and Between Proteins," Nucleic Acids Research, vol. 43, pp. 494-502, 2015. Crossref, https://doi.org/10.1093/nar/gku1081
 Gwo-Yu Chuang et al., "Computational Prediction of N-Linked Glycosylation Incorporating Structural Properties and Patterns," Bioinformatics, vol. 28, no, 17, pp. 2249–2255, 2012. Crossref, https://doi.org/10.1093/bioinformatics/bts426
 Ying Xu et al., "Phoscontext2vec: A Distributed Representation of Residue-Level Sequence Contexts and its Application to General and Kinase-Specific Phosphorylation Site Prediction," Scientific Reports, vol. 8, p. 8240, 2018. Crossref, https://doi.org/10.1038/s41598-018-26392-7