Lightweight Attention-Guided CNN–LSTM for Image Captioning

Abhimanu Yadav; Anil Verma; Supriya Gupta

doi:10.3126/jkbc.v7i1.88398

Authors

Abhimanu Yadav Kathmandu BernHardt College, Bafal, Kathmandu
Anil Verma Department of Computer Engineering,Institute of Engineering, Tribhuvan University, Kathmandu
Supriya Gupta Department of Computer Science and Information Technology,Tribhuvan University, Kathmandu

DOI:

https://doi.org/10.3126/jkbc.v7i1.88398

Keywords:

Index Terms—Image Captioning, Visual Attention, Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Deep Learning

Abstract

Automatically generating meaningful captions for images is a fundamental problem in both computer vision and natural language processing. The existing models often struggle with complex scenes, object relationships, and computational efficiency. In this paper, We introduce a lightweight image captioning method that integrates VGG-16 ConvNets for robust spatial feature extraction with a soft attention model and an LSTM decoder to selectively attend to only salient portions of an image when generating its attendant caption. The model is trained and tested on the Flickr8k dataset consisting of 8,000 images with five captions for each image. Experimental results show competitive performance with BLEU-1, BLEU-2, BLEU-3 and BLEU-4 from 0.53 to 0.10 respectively illustrating the model is able to identify objects and generate coherent image descriptions with context information. The proposed method offers an efficient and explainable solution that effectively bridges visual content and natural language, contributing to more accessible and intelligent multimedia technology.

Downloads

Download data is not yet available.

Abstract

17

PDF

10

Lightweight Attention-Guided CNN–LSTM for Image Captioning

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Current Issue

Information