Comparative Analysis of Gene Prediction Tools: RAST, Genmark hmm and AMIgene

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
© 2017 by IJETT Journal
Volume-43 Number-4
Year of Publication : 2017
Authors : Chander Jyoti, Sandeep Saini, Varinder Kumar, Kajal Abrol, Kanchan Pandey, Ankit Sharma
DOI :  10.14445/22315381/IJETT-V43P238


Chander Jyoti, Sandeep Saini, Varinder Kumar, Kajal Abrol, Kanchan Pandey, Ankit Sharma "Comparative Analysis of Gene Prediction Tools: RAST, Genmark hmm and AMIgene", International Journal of Engineering Trends and Technology (IJETT), V43(4),234-237 January 2017. ISSN:2231-5381. published by seventh sense research group

High throughput genome sequencing made large amount of genome data available to research community. Accurate gene structure prediction and annotation is the fundamental step towards the understanding of genome function. A large number of gene prediction tool and pipeline have been developed over the past year. To understand whether the prediction tools and pipeline are providing same or different result for the same genome or not, we have compared manually the gene prediction result of RAST (Rapid Annotations using Subsystems Technology), AMIgene (Annotation of MIcrobial Genes) and Genmark hmm for organism Mycoplasm genitalium in reference to Genbank CDS (Coding Sequence) or gene. During comparative analysis we have seen the similarity as well as variation in prediction result of each tool. Variation in prediction results were also seen in total number of CDS predicted, gene coordinate and gene length. We have tried to find the reason behind the variation in prediction result and try to relate our analysis with nowadays high throughput data analysis. These types of analysis are useful to annotate a newly sequenced genome.


1. Mathe C, Sagot M-F, Schiex T, Rouze P. Survey and Summary: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research. 2002; 30(19):4103-4117.
2. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59
3. Lewis S1, Ashburner M, Reese MG. Annotating eukaryote genomes. Curr Opin Struct Biol. 2000 Jun; 10(3):349-54.
4. Aseri TC. A Review of Soft Computing Techniques for Gene Prediction. ISRN Genomics, (2013), 1–8.
5. Searls DB. Using bioinformatics in gene and drug discovery. Drug Discov Today. 2000 Apr; 5 (4):135-143.
6. Rust AG, Mongin E, Birney E. Genome annotation techniques: new approaches and challenges. Drug Discov Today. 2002 Jun 1;7 (11):S70-6.
7. Besemer J, Borodovsky M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Research. 2005;33 (Web Server issue):W451-W454. doi:10.1093/nar/gki487.
8. Goodswen SJ, Kennedy PJ, Ellis JT (2012) Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques. PLoS ONE 7(11): e50609. doi:10.1371/journal.pone.0050609.
9. Wang Z, Chen Y, Li Y. A brief review of computational gene prediction methods. Genomics Proteomics Bioinformatics. 2004 Nov; 2(4):216-21.
10. Beiting DP, Roos DS. A systems biological view of intracellular pathogens. Immunol Rev 2011, 240:117–128.
11. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119.
12. Shibuya T, Rigoutsos I. Dictionary-driven prokaryotic gene finding. Nucleic Acids Research. 2002;30(12):2710-2725.
13. Fraser CM et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20; 270: 397-403.
14. Tatusova T, DiCuccio M, Badretdin A, et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Research. 2016; 44(14):6614-6624. doi:10.1093/nar/gkw569.
15. Aziz RK, Bartels D, Best AA, et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 2008;9:75. doi:10.1186/1471-2164-9-75.
16. Overbeek R, Begley T, Butler RM, et al. The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. Nucleic Acids Research. 2005;33(17):5691-5702. doi:10.1093/nar/gki866.
17. Brettin T, Davis JJ, Disz T, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Scientific Reports. 2015;5:8365. doi:10.1038/srep08365.
18. Bocs S, Cruveiller S, Vallenet D, Nuel G, Médigue C. AMIGENEGene: Annotation of MIcrobial Genes. Nucleic Acids Research. 2003;31(13):3723-3726.
19. Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research. 1998;26(4):1107-1115.
20. Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research. 2001;29(12):2607-2618.

Gene Prediction, CDS, Annotation, Mycoplasm genitalium.