Automatic Image Captioning Using Neural Networks

Authors

  • Subash Pandey Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal
  • Rabin Kumar Dhamala Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal
  • Bikram Karki Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal
  • Saroj Dahal Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal
  • Rama Bastola Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal

DOI:

https://doi.org/10.3126/jiee.v3i1.34335

Keywords:

CNN, Image Captioning, Image Description, LSTM, RNN

Abstract

 Automatically generating a natural language description of an image is a major challenging task in the field of artificial intelligence. Generating description of an image bring together the fields: Natural Language Processing and Computer Vision. There are two types of approaches i.e. top-down and bottom-up. For this paper, we approached top-down that starts from the image and converts it into the word. Image is passed to Convolutional Neural Network (CNN) encoder and the output from it is fed further to Recurrent Neural Network (RNN) decoder that generates meaningful captions. We generated the image description by passing the real time images from the camera of a smartphone as well as tested with the test images from the dataset. To evaluate the model performance, we used BLEU (Bilingual Evaluation Understudy) score and match predicted words to the original caption.

Downloads

Download data is not yet available.
Abstract
139
PDF
220

Author Biographies

Subash Pandey, Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal

Department of Electronics and Computer Engineering

Rabin Kumar Dhamala, Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal

Department of Electronics and Computer Engineering

Saroj Dahal, Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal

Department of Electronics and Computer Engineering

Rama Bastola, Thapathali Campus, Institute of Engineering, Tribhuvan University, Kathmandu, Nepal

Department of Electronics and Computer Engineering

Downloads

Published

2020-03-31

How to Cite

Pandey, S., Dhamala, R. K., Karki, B., Dahal, S., & Bastola, R. (2020). Automatic Image Captioning Using Neural Networks. Journal of Innovations in Engineering Education, 3(1), 138–146. https://doi.org/10.3126/jiee.v3i1.34335

Issue

Section

Articles