ProdoDB: a sequencematched protein domain database with REST API service

Byung Ryul Jeon

doi:https://doi.org/10.14445/22315381/IJETT-V68I5P203S

Research Article | Open Access | Download PDF

Volume 68 | Issue 5 | Year 2020 | Article Id. IJETT-V68I5P203S | DOI : https://doi.org/10.14445/22315381/IJETT-V68I5P203S

ProdoDB: a sequencematched protein domain database with REST API service

Byung Ryul Jeon

Citation :

Byung Ryul Jeon, "ProdoDB: a sequencematched protein domain database with REST API service," International Journal of Engineering Trends and Technology (IJETT), vol. 68, no. 5, pp. 13-15, 2020. Crossref, https://doi.org/10.14445/22315381/IJETT-V68I5P203S

Abstract

With ever-increasing amounts of genomic data being generated, most analysis of next generation sequencing data is performed with extensive use of bioinformatics tools. For proper analysis of large experimental datasets, linking to proper resources with adequate unique identifiers (IDs) is critical. However, for protein databases, although numerous genetic and protein databases provide associated unique IDs, due to the polymorphisms and isoforms of proteins used in research, protein sequences can differ among associated databases. As functional domain information is a key element for interpretation of genetic sequence variants, an easily accessible integrated protein domain database is needed. Here we present ProdoDB, a protein domain database providing sequence-matched Swiss-Prot and National Center for Biotechnology Information(NCBI) protein reference sequence unique ID mapping, as well as corresponding sequence information, including gene and domain information, as a REST API service following OpenAPI standards.

Keywords

ProdoDB, Protein Sequence Mapping, Protein Database, Protein Domain and Site.

References

[1] C. Wu, I. Macleod, and A. I. Su, "BioGPS and MyGene.info: organizing online, gene-centric information," Nucleic Acids Res, vol. 41, no. Database issue, pp. D561-5, Jan 2013.
[2] Y. M. Park, S. Squizzato, N. Buso, T. Gur, and R. Lopez, "The EBI search engine: EBI search as a service-making biological data accessible for all," Nucleic Acids Res, vol. 45, no. W1, pp. W545-W549, Jul 3 2017.
[3] J. Xin et al., "High-performance web services for querying gene and variant annotation," Genome Biol, vol. 17, no. 1, p. 91, May 6 2016.
[4] R. Dalgleish et al., "Locus Reference Genomic sequences: an improved basis for describing human DNA variants," Genome Med, vol. 2, no. 4, p. 24, Apr 15 2010.
[5] J. A. MacArthur et al., "Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants," Nucleic Acids Res, vol. 42, no. Database issue, pp. D873-8, Jan 2014.
[6] S. Richards et al., "Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology," Genet Med, vol. 17, no. 5, pp. 405-24, May 2015.
[7] T. UniProt Consortium, "UniProt: the universal protein knowledgebase," Nucleic Acids Res, vol. 46, no. 5, p. 2699, Mar 16 2018.
[8] M. J. Landrum et al., "ClinVar: public archive of relationships among sequence variation and human phenotype," Nucleic Acids Res, vol. 42, no. Database issue, pp. D980-5, Jan 2014.
[9] H. M. Berman et al., "The Protein Data Bank," Nucleic Acids Res, vol. 28, no. 1, pp. 235-42, Jan 1 2000.
[10] A. Nightingale et al., "The Proteins API: accessing key integrated protein and genome information," Nucleic Acids Res, vol. 45, no. W1, pp. W539-W544, Jul 3 2017.
[11] N. A. O`Leary et al., "Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation," Nucleic Acids Res, vol. 44, no. D1, pp. D733-45, Jan 4 2016.
[12] K. D. Pruitt, T. Tatusova, and D. R. Maglott, "NCBI Reference Sequence project: update and current status," Nucleic Acids Res, vol. 31, no. 1, pp. 34-7, Jan 1 2003.
[13] S. Pundir, M. Magrane, M. J. Martin, C. O`Donovan, and C. UniProt, "Searching and Navigating UniProt Databases," Curr Protoc Bioinformatics, vol. 50, pp. 1 27 1-10, Jun 19 2015.
[14] G. A. Van der Auwera et al., "From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline," Curr Protoc Bioinformatics, vol. 43, pp. 11 10 1-11 10 33, 2013.