Deep Neural Network: An Efficient and Optimized Machine Learning Paradigm for Reducing Genome Sequencing Error

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2020 by IJETT Journal
Volume-68 Issue-9
Year of Publication : 2020
Authors : Dr. Ferdinand Kartriku, Robert Sowah, Charles Saah
DOI :  10.14445/22315381/IJETT-V68I9P205

Citation 

MLA Style: Dr. Ferdinand Kartriku, Robert Sowah, Charles Saah  "Deep Neural Network: An Efficient and Optimized Machine Learning Paradigm for Reducing Genome Sequencing Error" International Journal of Engineering Trends and Technology 68.9(2020):27-30. 

APA Style:Dr. Ferdinand Kartriku, Robert Sowah, Charles Saah. Deep Neural Network: An Efficient and Optimized Machine Learning Paradigm for Reducing Genome Sequencing Error  International Journal of Engineering Trends and Technology, 68(9),27-30.

Abstract
Genomic data is used in many fields but, it has become known that most of the platforms used in the genome sequencing process produce significant errors. This means that the analysis and inferences generated from these data, may have some errors that need to be corrected. On the two main types (substitution and indels) of genome errors, our work focused on correcting errors emanating from indels. A deep learning approach was used to correct the errors in sequencing the chosen dataset.

Reference

[1] Abnizova, R. te Boekhorst, and Y. L. Orlov, “Computational Errors and Biases in Short Read Next Generation Sequencing,” J. Proteomics Bioinform., vol. 10, no. 1, pp. 1–17, 2017.
[2] X. Yang, S. P. Chockalingam, and S. Aluru, “A survey of error-correction methods for next-generation sequencing,” Brief. Bioinform., vol. 14, no. 1, pp. 56–66, 2013.
[3] L. Ilie, F. Fazayeli, and S. Ilie, “HiTEC: Accurate error correction in high-throughput sequencing data,” Bioinformatics, vol. 27, no. 3, pp. 295–302, 2011.
[4] S. Fu, A. Wang, and K. F. Au, “A comparative evaluation of hybrid error correction methods for error-prone long reads,” Genome Biol., vol. 20, no. 1, pp. 1–17, 2019.
[5] J. A. Sleep, A. W. Schreiber, and U. Baumann, “Sequencing error correction without a reference genome,” BMC Bioinformatics, vol. 14, 2013.
[6] D. R. Kelley, M. C. Schatz, and S. L. Salzberg, “Quake: Quality-aware detection and correction of sequencing errors,” Genome Biol., vol. 11, no. 11, 2010.
[7] G. Marçais, J. A. Yorke, and A. Zimin, “QuorUM: An error corrector for Illumina reads,” PLoS One, vol. 10, no. 6, pp. 1–13, 2015.
[8] W. Lin, R. Piskol, M. H. Tan, and J. B. Li, “Comment on ‘Widespread RNA and DNA sequence differences in the human transcriptome,’” Science (80-. )., vol. 335, no. 6074, 2012.
[9] C. S. Pareek, R. Smoczynski, and A. Tretyn, “Sequencing technologies and genome sequencing,” J. Appl. Genet., vol. 52, no. 4, pp. 413–435, 2011.
[10] N. Whiteford et al., “Swift: Primary data analysis for the Illumina Solexa sequencing platform,” Bioinformatics, vol. 25, no. 17, pp. 2194–2199, 2009.
[11] L. Li and T. P. Speed, “An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing,” Electrophoresis, vol. 20, no. 7, pp. 1433–1442, 1999.
[12] C. Ledergerber and C. Dessimoz, “Base-calling for nextgeneration sequencing platforms,” Brief. Bioinform., vol. 12, no. 5, pp. 489–497, 2011.
[13] K. Nakamura et al., “Sequence-specific error profile of Illumina sequencers,” Nucleic Acids Res., vol. 39, no. 13, 2011.
[14] L. Salmela and J. Schröder, “Correcting errors in short reads by multiple alignments,” Bioinformatics, vol. 27, no. 11, pp. 1455–1461, 2011.
[15] M. Chaisson, P. Pevzner, and H. Tang, “Fragment assembly with short reads,” Bioinformatics, vol. 20, no. 13, pp. 2067–2074, 2004.
[16] J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer, “Substantial biases in ultra-short read data sets from highthroughput DNA sequencing,” Nucleic Acids Res., vol. 36, no. 16, 2008.
[17] K. Sameith, J. G. Roscito, and M. Hiller, “Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly,” Brief. Bioinform., vol. 18, no. 1, pp. 1–8, 2017.
[18] A. Limasset, J. F. Flot, P. Peterlongo, and A. Valencia, “Toward perfect reads: Self-correction of short reads via mapping on de Bruijn graphs,” Bioinformatics, vol. 36, no. 5, pp. 1374–1381, 2020.
[19] M. Heydari, G. Miclotte, Y. Van De Peer, and J. Fostier, “Illumina error correction near highly repetitive DNA regions improves de novo genome assembly,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–13, 2019.
[20] W. C. Kao, A. H. Chan, and Y. S. Song, “ECHO: A reference-free short-read error correction algorithm,” Genome Res., vol. 21, no. 7, pp. 1181–1192, 2011.
[21] P. A. Pevzner, H. Tang, and M. S. Waterman, “An Eulerian path approach to DNA fragment assembly,” Proc. Natl. Acad. Sci. U. S. A., vol. 98, no. 17, pp. 9748–9753, 2001.
[22] J. Schröder, H. Schröder, S. J. Puglisi, R. Sinha, and B. Schmidt, “SHREC: A short-read error correction method,” Bioinformatics, vol. 25, no. 17, pp. 2157–2163, 2009.
[23] J. Schröder, H. Schröder, S. J. Puglisi, R. Sinha, and B. Schmidt, “SHREC: A short-read error correction method,” Bioinformatics, vol. 25, no. 17, pp. 2157–2163, Sep. 2009.

Keywords
genome sequencing; error correction; deep learning; indels;