🧠 Image Caption Generator Using CNN-LSTM (FastAPI Deployment)

📘 Overview

This project implements an Image Caption Generator, a deep learning model that automatically generates descriptive captions for images. It combines Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (LSTMs) for language modeling, trained on image–caption datasets.

You can upload an image via a FastAPI web interface, and the app returns a meaningful caption generated by the trained model.

🎥 Project Demo

Watch the full demo here: YouTube Video

--

🚀 Project Workflow

1. Data Collection & Preprocessing

Dataset Used: Flickr8k Dataset (8,000 images with 5 captions each).
Preprocessing Steps:

Cleaned captions (removed punctuation, lowercase conversion, tokenization).

Added special tokens: "start" and "end" to each caption.

Used InceptionV3 (pretrained on ImageNet) for feature extraction.

Extracted 2048-dimensional feature vectors for each image.

2. Text Tokenization

Used Tokenizer from Keras to build a vocabulary from all captions.
Converted captions to integer sequences.
Applied padding to make all sequences of equal length.
Defined max_length based on the longest caption.

3. Model Architecture

The model follows a CNN + LSTM Encoder–Decoder approach.

🧩 Encoder (Image Feature Extractor)

Input: Extracted feature vector (2048-dim).

Layers:

Dropout(0.5)

Dense(256, activation='relu')

Output: 256-dim projected feature.

🧠 Decoder (Language Model)

Input: Sequence of tokens.

Layers:

Embedding(vocab_size, 256, mask_zero=True)

LSTM(256, return_sequences=True)

Dense(vocab_size, activation='softmax')

🏗️ Combined Model

The encoder and decoder outputs are merged via add(), followed by Dense layers to predict the next word in the sequence.

🏋️ Training

Epochs: 17

Batch Size: 32

Optimizer: Adam

Loss: Categorical Cross-Entropy

Used data generator to feed (image features, input sequence, output word) tuples in memory-efficient batches.

Validation captions were generated at intervals to monitor quality.

🧩 Model Evaluation

Generated captions for random test images.

Actual:

two dogs are playing with each other on the pavement
black dog and tri-colored dog playing with each other on the road

Predicted:

two dogs are playing on the road

⚙️ FastAPI Deployment

Backend (main.py)

Built REST API using FastAPI.

Endpoint /predict/ accepts uploaded image and returns generated caption.

Utilized pre-trained model (model.h5), tokenizer (tokenizer.pkl), and features extractor.

Frontend

Simple and elegant HTML + CSS form.

Upload an image → get caption → view output instantly.

Deployed locally via:

uvicorn main:app --reload

v✨ Author

Ali Ahmad

Data Scientist & AI/ML Engineer

📧 aliahmaddawana@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
__pycache__		__pycache__
app		app
model		model
static		static
templates		templates
uploads		uploads
.gitignore		.gitignore
Readme.md		Readme.md
mapping.pkl		mapping.pkl
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Image Caption Generator Using CNN-LSTM (FastAPI Deployment)

📘 Overview

You can upload an image via a FastAPI web interface, and the app returns a meaningful caption generated by the trained model.

🎥 Project Demo

🚀 Project Workflow

1. Data Collection & Preprocessing

2. Text Tokenization

3. Model Architecture

🧠 Decoder (Language Model)

🏗️ Combined Model

🏋️ Training

🧩 Model Evaluation

⚙️ FastAPI Deployment

v✨ Author

About

Uh oh!

Releases

Packages

Languages

aliahmad552/image-caption-generator-using-deeplearning-nlp

Folders and files

Latest commit

History

Repository files navigation

🧠 Image Caption Generator Using CNN-LSTM (FastAPI Deployment)

📘 Overview

You can upload an image via a FastAPI web interface, and the app returns a meaningful caption generated by the trained model.

🎥 Project Demo

🚀 Project Workflow

1. Data Collection & Preprocessing

2. Text Tokenization

3. Model Architecture

🧠 Decoder (Language Model)

🏗️ Combined Model

🏋️ Training

🧩 Model Evaluation

⚙️ FastAPI Deployment

v✨ Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages