An Attention Mechanism and GRU Based Deep Learning Model for Automatic Image Captioning

© 2022 by IJETT Journal
Volume-70 Issue-3
Year of Publication : 2022
Authors : Gaurav, Pratistha Mathur

How to Cite?

Gaurav, Pratistha Mathur, "An Attention Mechanism and GRU Based Deep Learning Model for Automatic Image Captioning," International Journal of Engineering Trends and Technology, vol. 70, no. 3, pp. 302-309, 2022. Crossref,

Image captioning is to generate the image descriptions automatically. In recent years, image captioning has become an active research area and a growing challenge in the field of computer vision and natural language processing. Image captioning using template-based methods and retrieval based approach had some limitations like missing important objects and other attributes. Later on, Encoder-Decoder based methods were presented as research methodologies for image captioning. To extract image information, Convolutional Neural Networks are utilised as encoders. Recurrent Neural Networks are used as decoders to utilise those data and generate content for an image in the encoder-decoder technique. Long short-term memory is the most common recurrent neural network used as a decoder by most researchers. In this paper, a new framework is proposed where Gated Recurrent Unit has been used as a decoder. Along with this, the proposed model has used visual attention for better image features. The proposed framework has been implemented using Flickr8K dataset. The Bilingual Evaluation Understudy score of the proposed framework has been compared with other states of the art frameworks, and it clearly shows that the framework is highly effective and produces state-of-the-art image captions.

Encoders, Decoders, GRU, Image Captioning.