Skip to content

Image caption generation based on encoder - decoder architecture with Flickr8k dataset. Beam search used for inference

Notifications You must be signed in to change notification settings

tuanbeba/img_captions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Caption Generator With Beam Search

The key components of project:

  • Model use the encoder-decoder based architecture. Mobilenet_v3_small used for Encoder and TransformerDecoder used for Decoder
  • Preprocessing caption and build tokeizer.
  • Model training and evaluation.
  • Use greedy search and beam search algorithms for inference task
  • Saving model weights and visualizing the results .

About the dataset

This repository using Flickr8k dataset and pytorch framework. The dataset organize the files as follows:

  • flickr8k
    • images
      • image files
    • captions.txt

Inference

You can download pre-trained best_model.pt weights and encoded images feature_extractor.pkl

You have to change config.root path to your workspace path.

Beam search algorithm

Beam search helps in generating the most optimal caption by considering multiple possibilities at each decoding step, rather than greedily selecting the word with the highest score. The example below demonstrates how using a beam width (k) of 3 results in better captions.

Accuracy

About

Image caption generation based on encoder - decoder architecture with Flickr8k dataset. Beam search used for inference

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published