Skip to content

paudelsamir/Image-Captioning-Transformer

Repository files navigation

Image Captioning

A simple app that generates captions for images using a Transformer decoder and ResNet-18 features. Upload your own image or try a sample to see what the model describes!

demo

Project Notebook

Live Demo


Model Info

  • Feature Extractor: ResNet-18 (pretrained)
  • Decoder: Transformer (3 layers, 8 heads, 512 emb, 2048 ff, dropout 0.2)
  • Vocabulary: 7,234 words
  • Metric: BLEU-4 score: 0.18

Model Download

The app will auto-download these when you run it, so you don't need to do it manually unless you want to.


How to Run Locally

  1. Clone this repo:
    git clone https://github.com/paudelsamir/Image-Captioning-Transformer.git
    cd Image-Captioning-Transformer
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the app:
    streamlit run app.py
    streamlit run demo_app.py (no requirements needed)

Windows Users

# Run the setup script
setup.bat

Linux/Mac Users

# Make setup script executable and run
chmod +x setup.sh
./setup.sh

Author


This is a fun project for learning and demo purposes. For details, see the notebook above.

About

image captioning on flickr8k using resnet features + transformer decoder

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages