Parallel and Scalable Deep Learning Algorithms for High Performance Computing Architectures

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2021 by IJETT Journal
Volume-69 Issue-4
Year of Publication : 2021
Authors : Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma
DOI :  10.14445/22315381/IJETT-V69I4P232

Citation 

MLA Style: Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma "Parallel and Scalable Deep Learning Algorithms for High Performance Computing Architectures" International Journal of Engineering Trends and Technology 69.4(2021):236-246. 

APA Style:Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma. Parallel and Scalable Deep Learning Algorithms for High Performance Computing Architectures  International Journal of Engineering Trends and Technology, 69(4),236-246.

Abstract
This paper elucidates the state-of-the-art design of parallel and scalable deep learning algorithms for high-performance computing (HPC) architectures. The paper starts with an application-focused introduction to deep learning. The HPC architectures discussed next include multicore processors and multi systems, which are representatives of the shared and distributed parallel programming paradigms, respectively. Followed by this is a discussion of the computational challenges inherent in deep learning. A review of research in deep learning and HPC has been carried out, and a short summary in the tabular form was provided. Open research directions in the field have been highlighted. Key steps in the deep learning algorithm development process for HPC are then discussed, followed by the possible outcomes. One section each has been dedicated to convolutional neural networks and the high-performance computing environment. The materials and methods used in a computational experiment in deep parallel learning have been described next. The experiment involves the design and development of a parallel algorithm and program for compute-intensive deep learning primitive and its performance testing. The results and the performance of the deep learning parallel program have been discussed. The paper ends with the concluding remarks in the conclusions.

Reference
[1] Zezhou Cheng, Qingxiong Yang, and Bin Sheng, "Deep Colorization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015) 415-423.
[2] Zhang R., Isola P., and Efros A.A., Colorful Image Colorization, European Conference on Computer Vision, Springer, 9907 (2016)
[3] Larsson G., Maire M., and Shakhnarovich G., Learning Representations for Automatic Colorization, Computer Vision( ECCV), Lecture Notes in Computer Science - Springer, 9908, (2016).
[4] Hwang, Jeff, and You Zhou., Image Colorization with Deep Convolutional Neural Networks, Stanford University,
[5] Andrew Owens et al., Visually Indicated Sounds, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, (2016) 2405 - 2413.
[6] I Sutskever, O. Vinyals, and Q.V. Le., Sequence to Sequence Learning, in Proc. Advances in Neural Information Processing Systems 27 (2014) 3104–3112.
[7] K. Cho et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. Conference on Empirical Methods in Natural Language Processing, (2014) 1724–1734.
[8] Zhang Jiajun and Zong Chengqing., Deep Neural Networks in Machine Translation: An Overview," IEEE Intelligent Systems, 30(5) (2015)16-25.
[9] A. Krizhevsky, I. Sutskever, and G. Hinton., ImageNet classification with deep convolutional neural networks, in NIPS Proceedings, (2012).
[10] A. G. Howard., Some improvements on deep convolutional neural network-based image classification, in International Conference on Learning Representation (ICLR), Banff, Canada, (2014).
[11] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov., Scalable Object Detection Using Deep Neural Networks, in IEEE Conference on Computer Vision and Pattern Recognition, Columbus, (2014) 2155-2162.
[12] D. Erhan, C. Szegedy, and A. Toshev., Scalable object detection using deep neural networks, in CVPR, (2014).
[13] Alex Graves., Generating Sequences With Recurrent Neural Networks, [Online]. https://arxiv.org/pdf/1308.0850.pdf
[14] Ilya Sutskever, James Martens, and Geoffrey E Hinton., Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), New York, NY, (2011) 1017-1024
[15] Andrej Karpathy and Li Fei-Fei., Deep Visual-Semantic Alignments for Generating Image Descriptions, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, (2015) 3128 - 3137.
[16] Ayushi Chahal, Preeti Gulia., Deep Learning: A Predictive IoT Data Analytics Method, International Journal of Engineering Trends and Technology, 68(7) (2020).
[17] P. Seetha Subha Priya, S. Nandhinidevi, M. Thangamani, S. Nallusamy., A Review on Exploring the Deep Learning Concepts and Applications for Medical Diagnosis, International Journal of Engineering Trends and Technology, 68(10) (2020).
[18] Sangeeta, Preeti Gulia., Deep learning-based combating strategy for COVID-19 induced increased video consumption, International Journal of Engineering Trends and Technology, 68(7) 2020.
[19] Ferdinand Kartriku, Robert Sowah, Charles Saah., Deep Neural Network: An Efficient and Optimized Machine Learning Paradigm for Reducing Genome Sequencing Error, International Journal of Engineering Trends and Technology, 68(9) (2020).
[20] Ramya T.E., Marikkannan, M., Investigations on Combinational Approach for Processing Remote Sensing Images Using Deep Learning Techniques, International Journal of Engineering Trends and Technology, 67(8) (2019).
[21] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton., Deep Learning, 521(2015) 436.
[22] X. W. Chen and X. Lin., Big Data Deep Learning: Challenges and Perspectives, IEEE Access, 2 (2014) 514-525.
[23] M.M. Najafabadi et al., Deep learning applications and challenges in big data analytics, Journal of Big Data, (2015) 1.
[24] Xian-He Sun, Yong Chen, and Surendra Byna., Scalable Computing in the Multicore Era, in Proceedings of the Inaugural Symposium on Parallel Algorithms, Architectures, and Programming, Hefei: University of Science and Technology of China Press, (2008).
[25] M. Tanveer, M.A. Iqbal, and F. Azam., Using Symmetric Multiprocessor Architectures for High Performance Computing Environments, International Journal of Computer Applications, 27(9) (2011).
[26] P. Angelov and A. Sperduti., Challenges in Deep Learning, in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), M. B. Giles and I. Reguly, Trends in high-performance computing for engineering calculations," in Phil.Trans.R.Soc.A, (2016) 27-29.
[27] J. Dean et al., Large Scale Distributed Deep Networks, in 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, (2012) 1223-1231.
[28] J. Hauswald et al., DjiNN and Tonic: DNN as a service and its implications for future warehouse-scale computers, in Proceedings 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA, (2015) 27-40.
[29] M. Bouache and J. Glover., Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection, in ICS-2016, Istanbul, Turkey,( 2016).
[30] Q. Le et al., On Optimization Methods for Deep Learning," in Proceedings of the International Conference on Machine Learning, Washington, (2011)
[31] V. Hegde and S. Usmani., Stanford University, (2016). https://web.stanford.edu/~rezab/dao/projects_reports/hedge_usmani.pdf [Online].
[32] J. Keuper and F.J. Pfreundt., Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms, in Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Austin, TX, USA, (2015).
[33] V. Vanhoucke, A. Senior, and M. Z. Mao., Improving the speed of neural networks on CPUs," in Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS Workshop, Granada SPAIN,(2011).
[34] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan., Deep Learning with Limited Numerical Precision, Journal of Machine Learning Research, 37 (2015).
[35] S Chetlur et al., cuDNN: Efficient Primitives for Deep Learning. (2014).[Online]. https://arxiv.org/pdf/1410.0759.pdf
[36] A. Delong. Practical Guide to Matrix Calculus for Deep Learning,http://www.psi.toronto.edu/~andrew/papers/matrix_calculus_for_learning.pdf [Online]
[37] Baoyuan Liu, Min Wang, H. Foroosh, M. Tappen, and M. Penksy., Sparse Convolutional Neural Networks, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, (2015) 806-814.
[38] C. Ionescu, O. Vantzos, and C. Sminchisescu., Matrix Backpropagation for Deep Networks with Structured Layers, in IEEE International Conference on Computer Vision (ICCV), Santiago, (2015) 2965-2973.
[39] Y. Zhang and S. Zhang., Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform, in IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, (2013) 71-78.
[40] Ciresan, Dan, Ueli Meier, and Jürgen Schmidhuber., Multi-column deep neural networks for image classification, In IEEE Conference on Computer Vision and Pattern Recognition (2012) 3642-3649.
[41] Ciresan, Dan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jurgen Schmidhuber., Flexible, High-Performance Convolutional Neural Networks for Image Classification, in International Joint Conference on Artificial Intelligence, (2013) 1237–1242.
[42] Lawrence, Steve, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back., Face Recognition: A Convolutional Neural Network Approach, IEEE Transactions on Neural Networks, 8(1) (1997) 98-113.
[43] Russakovsky, O., Deng, J., Su, H. et al., ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis, (115) 2015.

Keywords
Parallel, Scalable, High Performance Computing, Multicore, Compute Cluster, Shared Parallel, Distributed Parallel, Deep Learning Algorithms.