Skip to content

prakhar175/Image-captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ–ผ๏ธ Image Captioning with ResNet50 + LSTM (Flickr8k)

This project implements an image captioning model using an Encoder-Decoder architecture, where a pretrained ResNet-50 CNN is used to extract image features and a stacked LSTM network generates textual descriptions. The model is trained and evaluated on the Flickr8k dataset, achieving a BLEU-1 score of 62% and BLEU-2 score of 41%, surpassing the original benchmark.

๐Ÿ“Œ Features

  • ๐Ÿง  Encoder-Decoder architecture using ResNet-50 + LSTM
  • ๐Ÿ“Š BLEU score evaluation
  • ๐Ÿ”ค Tokenization and padding of captions
  • ๐Ÿ“ Data pipeline with preprocessing and feature extraction
  • ๐Ÿงช Training visualization and performance tracking

๐Ÿ“ Dataset


๐Ÿ› ๏ธ Technologies Used

  • Python
  • TensorFlow & Keras
  • ResNet-50 (pretrained on ImageNet)
  • LSTM for sequence generation
  • Numpy, Matplotlib, Pickle, tqdm

๐Ÿงฎ Model Architecture

Encoder:

  • Pretrained ResNet-50 with final classification layer removed
  • Extracted 2048-dimension feature vectors

Decoder:

  • Embedding layer for word vectors
  • Stacked LSTM layers
  • Dense layers to predict the next word in sequence

๐Ÿ“Š Results

Metric Score
BLEU-1 65%
BLEU-2 42%
BLEU-3 27%
BLEU-4 18%

โœจ Scores surpass the original paper which achieved BLEU-1 of 61% and BLEU-2 of 41%.


About

Give me Image, I will give you Captions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published