An Attention Mechanism and GRU Based Deep Learning Model for Automatic Image Captioning

An Attention Mechanism and GRU Based Deep Learning Model for Automatic Image Captioning

  IJETT-book-cover           
  
© 2022 by IJETT Journal
Volume-70 Issue-3
Year of Publication : 2022
Authors : Gaurav, Pratistha Mathur
https://doi.org/10.14445/22315381/IJETT-V70I3P234

How to Cite?

Gaurav, Pratistha Mathur, "An Attention Mechanism and GRU Based Deep Learning Model for Automatic Image Captioning," International Journal of Engineering Trends and Technology, vol. 70, no. 3, pp. 302-309, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I3P234

Abstract
Image captioning is to generate the image descriptions automatically. In recent years, image captioning has become an active research area and a growing challenge in the field of computer vision and natural language processing. Image captioning using template-based methods and retrieval based approach had some limitations like missing important objects and other attributes. Later on, Encoder-Decoder based methods were presented as research methodologies for image captioning. To extract image information, Convolutional Neural Networks are utilised as encoders. Recurrent Neural Networks are used as decoders to utilise those data and generate content for an image in the encoder-decoder technique. Long short-term memory is the most common recurrent neural network used as a decoder by most researchers. In this paper, a new framework is proposed where Gated Recurrent Unit has been used as a decoder. Along with this, the proposed model has used visual attention for better image features. The proposed framework has been implemented using Flickr8K dataset. The Bilingual Evaluation Understudy score of the proposed framework has been compared with other states of the art frameworks, and it clearly shows that the framework is highly effective and produces state-of-the-art image captions.

Keywords
Privacy, Association Rule Mining (ARM), Cloud, Apriori algorithm, Distributed system.

Reference
[1] M. D. Zakir Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga, A Comprehensive Survey of Deep Learning for Image Captioning, ACM Comput. Surv. 51(6) (2019). doi: 10.1145/3295748.
[2] N. Yu, X. Hu, B. Song, J. Yang, and J. Zhang, Topic-Oriented Image Captioning Based on Order-Embedding, IEEE Trans. Image Process. 28(6) (2019) 2743–2754. doi: 10.1109/TIP.2018.2889922.
[3] T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei, Boosting Image aptioning with Attributes.
[4] K. Loganathan, R. Sarath Kumar, V. Nagaraj, and T. J. John, CNN & LSTM Using Python for Automatic Image Captioning, Mater. Today Proc. (2020). doi: 10.1016/j.matpr.2020.10.624.
[5] W. Cai and Q. Liu, Image Captioning with Semantic-Enhanced Features and Extremely Hard Negative Examples, Neurocomputing. 413 (2020) 31–40. doi: 10.1016/j.neucom.2020.06.112.
[6] X. Lu, B. Wang, and X. Zheng, Sound Active Attention Framework for Remote Sensing Image Captioning, IEEE Trans. Geosci. Remote Sens. 58(3) (2020) 1985–2000. doi: 10.1109/TGRS.2019.2951636.
[7] Y. Jing, X. Zhiwei, and G. Guanglai, Context-Driven Image Caption with Global Semantic Relations of the Named Entities, IEEE Access. 8 (2020) 143584–143594. doi: 10.1109/ACCESS.2020.3013321.
[8] C. Wang, H. Yang, and C. Meinel, Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning, ACM Trans. Multimed. Comput. Commun. Appl. 14(2s) (2018). doi: 10.1145/3115432.
[9] X. Xiao, L. Wang, K. Ding, S. Xiang, and C. Pan, Deep Hierarchical Encoder-Decoder Network for Image Captioning, IEEE Trans. Multimed. 21(11) (2019) 2942–2956. doi: 10.1109/TMM.2019.2915033.
[10] Z. Deng, Z. Jiang, R. Lan, W. Huang, and X. Luo, Image Captioning Using Densenet Network and Adaptive Attention, Signal Process. Image Commun. 85 (2020) 115836. doi: 10.1016/j.image.2020.115836.
[11] S. H. Han and H. J. Choi, Domain-Specific Image Caption Generator with Semantic Ontology, Proc. - 2020 IEEE Int. Conf. Big Data Smart Comput. Bigcomp. (2020) 526–530. doi: 10.1109/BigComp48618.2020.00-12.
[12] H. Wei, Z. Li, C. Zhang, and H. Ma, The Synergy of Double Attention: Combine Sentence-Level and Word-Level Attention for Image Captioning, Comput. Vis. Image Underst. 201 (2019) 103068. doi: 10.1016/j.cviu.2020.103068.
[13] Z. Yang and Q. Liu, ATT-BM-SOM: A Framework of Effectively Choosing Image Information and Optimizing Syntax for Image Captioning, IEEE Access. 8 (2020) 50565–50573. doi: 10.1109/ACCESS.2020.2980578.
[14] Y. Huang, J. Chen, W. Ouyang, W. Wan, and Y. Xue, Image Captioning with End-to-End Attribute Detection and Subsequent Attributes Prediction, IEEE Trans. Image Process. 29 (2020) 4013–4026. doi: 10.1109/TIP.2020.2969330.
[15] H. Wang, H. Wang, and K. Xu, Evolutionary Recurrent Neural Network for Image Captioning, Neurocomputing. 401 (2020) 249–256. doi: 10.1016/j.neucom.2020.03.087.
[16] X. Lu, B. Wang, X. Zheng, and X. Li, Sensing Image Caption Generation, IEEE Trans. Geosci. Remote Sens. 56(4) (2017) 1–13.
[17] C. Wu, S. Yuan, H. Cao, Y. Wei, and L. Wang, Hierarchical Attention-Based Fusion for Image Caption with Multi-Grained Rewards, IEEE Access. 8 (2020) 57943–57951. doi: 10.1109/ACCESS.2020.2981513.
[18] L. Gao, X. Li, J. Song, and H. T. Shen, Hierarchical LSTMs with Adaptive Attention for Visual Captioning, IEEE Trans. Pattern Anal. Mach. Intell. 42(5) (2020) 1112–1131. doi: 10.1109/TPAMI.2019.2894139.
[19] J. Liu et al., Interactive Dual Generative Adversarial Networks for Image Captioning, AAAI 2020 - 34th AAAI Conf. Artif. Intell. (2020) 11588–11595. Doi: 10.1609/aaai.v34i07.6826.
[20] X. Shen, B. Liu, Y. Zhou, J. Zhao, and M. Liu, Remote Sensing Image Captioning Via Variational Autoencoder and Reinforcement Learning, Knowledge-Based Syst. 203 (2020) 05920. doi: 10.1016/j.knosys.2020.105920.
[21] J. Jansi Rani and B. Kirubagari, An Intelligent Image Captioning Generator using Multi-Head Attention Transformer, Int. J. Eng. Trends Technol. 69(12) (2021) 267–279. doi: 10.14445/22315381/IJETT-V69I12P232.
[22] V. Teju and D. Bhavana, An Efficient Object Tracking in Thermal Imaging Using Optimal Kalman Filter, Int. J. Eng. Trends Technol. 69(12) (2021) 197–202. doi: 10.14445/22315381/IJETT-V69I12P223.
[23] B. Wang, X. Zheng, B. Qu, X. Lu, and S. Member, Remote Sensing Image Captioning, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sensing. 13 (2020) 256–270.
[24] B. C. Mateus, M. Mendes, J. T. Farinha, R. Assis, and A. M. Cardoso, Comparing LSTM and GRU Models to Predict the Condition of a Pulp Paper Press, Energies. 14(21) (2021) 1–21. doi: 10.3390/En14216958.
[25] S. Kalra and A. Leekha, Survey of Convolutional Neural Networks for Image Captioning, J. Inf. Optim. Sci. 41(1) (2020) 239–260. doi: 10.1080/02522667.2020.1715602.
[26] M. Liu, L. Li, H. Hu, W. Guan, and J. Tian, Image Caption Generation with Dual Attention Mechanism, Inf. Process. Manag. 57(2) (2020) 102178. doi: 10.1016/j.ipm.2019.102178.

Keywords
Encoders, Decoders, GRU, Image Captioning.