Analysis and Design of High Performance Deep Learning Algorithm: Convolutional Neural Networks

Analysis and Design of High Performance Deep Learning Algorithm: Convolutional Neural Networks

© 2021 by IJETT Journal
Volume-69 Issue-6
Year of Publication : 2021
Authors : Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma
DOI :  10.14445/22315381/IJETT-V69I6P231

How to Cite?

Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma, "Analysis and Design of High Performance Deep Learning Algorithm: Convolutional Neural Networks," International Journal of Engineering Trends and Technology, vol. 69, no. 6, pp. 216-224, 2021. Crossref,

Deep learning algorithms like convolutional neural networks (CNNs) have a multi-layered computational design. The CNN comprises of stacks of different layers which perform feature engineering and training or classification computations on the inputs which are generally 3-D tensor datasets. Training a CNN is very demanding in terms of computational resources and time. Training times of several weeks and even months are not unheard of. This is one of the important reasons limiting widespread adoption of CNNs in new applications. Performance enhancement of CNNs is therefore an active R&D area. In view of this, the design of CNN algorithms for high performance distributed and parallel computing architectures assumes significance. The CNN can be conceptualized as a pipeline system which makes CNNs amenable to pipeline parallelism. In the present work, a pipeline computation design and model of the CNN has been proposed. The performance of the pipeline model of the CNN has been analyzed based on representative data generated through different computational experiments. Analysis shows that a net performance gain of 18X can be achieved on a CNN feature engineering pipeline by combining pipeline parallelism with task parallelism.

Deep Learning, Convolutional Neural Networks, Pipeline Computing, Pipeline Parallelism, Task Parallelism, High Performance Computing.

[1] Zezhou Cheng, Qingxiong Yang, and Bin Sheng, Deep Colorization, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015) 415-423.
[2] Zhang R., Isola P., and Efros A.A., Colorful Image Colorization, European Conference on Computer Vision, 2016 - Springer, vol. 9907, 2016.
[3] Larsson G., Maire M., and Shakhnarovich G., Learning Representations for Automatic Colorization, Computer Vision( ECCV), Lecture Notes in Computer Science - Springer, 9908, 2016.
[4] Hwang, Jeff, and You Zhou, Image Colorization with Deep Convolutional Neural Networks, Stanford University, 2016.
[5] Andrew Owens et al., Visually Indicated Sounds, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, (2016) 2405 - 2413.
[6] I Sutskever, O. Vinyals, and Q.V. Le, Sequence to Sequence Learning, in Proc. Advances in Neural Information Processing Systems 27 (2014) 3104–3112.
[7] K. Cho et. al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. Conference on Empirical Methods in Natural Language Processing, (2014) 1724–1734.
[8] Zhang Jiajun and Zong Chengqing, Deep Neural Networks in Machine Translation: An Overview, IEEE Intelligent Systems, 30(5) (2015) 16-25
[9] A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep convolutional neural networks, in NIPS Proceedings, 2012.
[10] A. G. Howard, Some improvements on deep convolutional neural network based image classification, in International Conference on Learning Representation (ICLR), Banff, Canada, 2014
[11] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, Scalable Object Detection Using Deep Neural Networks, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, (2014) 2155-2162.
[12] D. Erhan, C. Szegedy, and A. Toshev, Scalable object detection using deep neural networks, in CVPR, 2014.
[13] Alex Graves, Generating Sequences With Recurrent Neural Networks , 2014. [Online].
[14] Ilya Sutskever, James Martens, and Geoffrey E Hinton, Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), New York, NY,( 2011) 1017-1024.
[15] Andrej Karpathy and Li Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Descriptions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, (2015) 3128 - 3137.
[16] Ayushi Chahal, Preeti Gulia, Deep Learning: A Predictive IoT Data Analytics Method, International Journal of Engineering Trends and Technology, 68(7) 2020.
[17] P. Seetha Subha Priya, S. Nandhinidevi, M. Thangamani, S. Nallusamy, A Review on Exploring the Deep Learning Concepts and Applications for Medical Diagnosis, International Journal of Engineering Trends and Technology, 68(10) (2020).
[18] Sangeeta, Preeti Gulia, Deep learning based combating strategy for COVID-19 induced increased video consumption, International Journal of Engineering Trends and Technology, 68(7) (2020).
[19] Ferdinand Kartriku, Robert Sowah, Charles Saah Deep Neural Network: An Efficient and Optimized Machine Learning Paradigm for Reducing Genome Sequencing Error, International Journal of Engineering Trends and Technology, 68(9) (2020).
[20] Ramya T.E., Marikkannan, M. Investigations on Combinational Approach for Processing Remote Sensing Images Using Deep Learning Techniques, International Journal of Engineering Trends and Technology, 67(8) ( 2019).
[21] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Deep Learning, 521 (2015) 436.
[22] X. W. Chen and X. Lin, Big Data Deep Learning: Challenges and Perspectives, IEEE Access, 2 (2014) 514-525.
[23] M.M. Najafabadi et. al., Deep learning applications and challenges in big data analytics, Journal of Big Data, 1 (2015).
[24] P. Angelov and A. Sperduti, Challenges in Deep Learning, in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), (2016) 27-29.
[25] Xian-He Sun, Yong Chen, and Surendra Byna, Scalable Computing in the Multicore Era, in Proceedings of the Inaugural Symposium on Parallel Algorithms, Architechures and Programming, Hefei: University of Science and Technology of China Press, 2008.
[26] M. Tanveer, M.A. Iqbal, and F. Azam, Using Symmetric Multiprocessor Architectures for High Performance Computing Environments, International Journal of Computer Applications, 27(9)(2011)
[27] M. B. Giles and I. Reguly, Trends in high-performance computing for engineering calculations, in Phil.Trans.R.Soc.A, 2014.
[28] J. Dean et. al., Large Scale Distributed Deep Networks, in 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, (2012) 1223-1231.
[29] J. Hauswald et. al., DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers, in Proceedings 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA, (2015) 27-40.
[30] V. Hegde and S. Usmani. (2016) Parallel and Distributed Deep Learning, Stanford University Online Report.
[31] M. Bouache and J. Glover, Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection, in ICS-2016, Istanbul, Turkey, 2016.
[32] Q. Le et. al., On Optimization Methods for Deep Learning, in Proceedings of the International Conference on Machine Learning, Washington, 2011.
[33] J. Keuper and F.J. Pfreundt, Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms, in Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Austin, TX, USA, 2015.
[34] V. Vanhoucke, A. Senior, and M. Z. Mao, Improving the speed of neural networks on CPUs, in Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS Workshop, Granada Spain, 2011.
[35] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, Deep Learning with Limited Numerical Precision, Journal of Machine Learning Research, 37 (2015).
[36] S Chetlur et. al. (2014) cuDNN: Efficient Primitives for Deep Learning. [Online].
[37] A. Delong. Practical Guide to Matrix Calculus for Deep Learning, [Online]
[38] Baoyuan Liu, Min Wang, H. Foroosh, M. Tappen, and M. Penksy, Sparse Convolutional Neural Networks, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, (2015) 806-814.
[39] C. Ionescu, O. Vantzos, and C. Sminchisescu, Matrix Backpropagation for Deep Networks with Structured Layers, in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, (2015) 2965-2973.
[40] Y. Zhang and S. Zhang, Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform, in 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, (2013) 71-78.
[41] Ciresan, Dan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition 3642-3649.
[42] Ciresan, Dan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jurgen Schmidhuber. 2011. Flexible, High Performance Convolutional Neural Networks for Image Classification. in 2013 International Joint Conference on Artificial Intelligence, 1237–1242.
[43] Lawrence, Steve, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. Face Recognition: A Convolutional Neural Network Approach, 1997 IEEE Transactions on Neural Networks, 8(1) 98-113.
[44] Russakovsky, O., Deng, J., Su, H. et al. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis vol. 115, 2015
[45] Sunil Pandey, Naresh Kumar Nagwani, Shrish Verma. Parallel and Scalable Deep Learning Algorithms for High Performance Computing Architectures International Journal of Engineering Trends and Technology, 69(4) (2021) 236-246.
[46] Mario Rossainz-López, Manuel I. Capel, Odon D. Carrasco-Limón, Fernando Hernández-Polo, Bárbara E. Sánchez-Rinza, Implementation of the Pipeline Parallel Programming Technique as an HLPC: Usage, Usefulness and Performance, Annals of Multicore and GPU Programming, 4 (1). ISSN: 2341-3158.
[47] I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Jim Sukha, Zhunping Zhang, On-the-Fly Pipeline Parallelism, ACM Transactions on Parallel Computing, 2(3).
[48] Halling-Brown M, Shepherd AJ. Constructing computational pipelines. Methods Mol Biol. 2008;453:451-70. doi: 10.1007/978-1-60327-429-6_24. PMID: 18712319
[49] Vivien Marx, When Computational Pipelines Go Clank. Nature Methods, vol 17, 659–662 (2020)
[50] Saurav Chatterjee and Jay Strosnider, Distributed Pipeline Scheduling: A Framework for Distributed, Heterogeneous Real-Time System Design, The Computer Journal, 38(4) (1995).
[51] A. Navarro, R. Asenjo, S. Tabik and C. Cascaval, Analytical Modeling of Pipeline Parallelism, 2009 18th International Conference on Parallel Architectures and Compilation Techniques, (2009) 281-290, doi: 10.1109/PACT.2009.28.
[52] Michael I. Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 151–162.
[53] K. Song and Y. Yan, “A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects,” Applied Surface Science, 285 (2013) 858-864.
[54] Yu He, Kechen Song, Qinggang Meng, Yunhui Yan, “An End-to-end Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features,” IEEE Transactions on Instrumentation and Measuremente, 69(4) (2020) 1493-1504.
[55] Hongwen Dong, Kechen Song, Yu He, Jing Xu, Yunhui Yan, Qinggang Meng, PGA-Net: Pyramid Feature Fusion and Global Context Attention Network for Automated Surface Defect Detection, IEEE Transactions on Industrial Informatics, 2020.