This project is an Image Caption Generator that utilizes a Pretrained ResNet-50 model for feature extraction from images and an LSTM (Long Short-Term Memory) model to generate captions for those images. The model has been trained on the COCO 2017 dataset, which contains a diverse collection of images and corresponding captions. The entire pipeline is deployed in a Streamlit app, allowing users to upload an image and receive a generated caption.
Create a virtual environment in Python using the venv
module.
-
Open a terminal or command prompt.
-
Navigate to the directory where you want to create the virtual environment. You can use the
cd
command to change your directory. For example:cd path/to/your/desired/directory
-
Once you are in the desired directory, run the following command to create a virtual environment:
On macOS and Linux:
python3 -m venv venv_name
On Windows (using Command Prompt):
python -m venv venv_name
Replace
venv_name
with the name you want to give to your virtual environment. For example:python3 -m venv myenv
-
Activate the virtual environment:
On macOS and Linux:
source venv_name/bin/activate
On Windows (using Command Prompt):
venv_name\Scripts\activate
After activation, your command prompt or terminal will show the virtual environment name, indicating that you are now working within the virtual environment.
Clone this repository to your local machine:
git clone https://github.com/KBVijayVarma/image-captioning.git
cd image-captioning
Install the required packages using pip in the Virtual Environment:
pip install -r requirements.txt
Before training the model, you need to prepare the COCO 2017 dataset.
Download the following from the COCO Website.
Unzip the above files into a folder coco_dataset. Refer the Project Structure
-
Create a folder
models
in the working directory -
Run the training.ipynb for training the Image Captioning Model
-
Rename the final pickle (.pkl) files in the
models
folder toencoder.pkl
anddecoder.pkl
To use the Image Caption Generator, launch the Streamlit app:
streamlit run app.py
In the Streamlit app, input the Image using the following options:
- URL of the Image
- File Uploader
- Camera
image-captioning/
│
├── imgcaptioning/
│ ├── coco_dataset.py
│ ├── data_loader.py
│ ├── model.py
│ ├── inference_pipeline.py
│ ├── tokenizer.py
│ ├── utils.py
│ └── vocabulary.py
│
├── models/
│ ├── encoder.pkl
│ └── decoder.pkl
│
├── coco_dataset/
│ ├── annotations/
│ │ ├── captions_train2017.json
│ │ ├── captions_val2017.json
│ │ ├── image_info_test-dev2017.json
│ │ ├── image_info_test2017.json
│ │ ├── instances_train2017.json
│ │ ├── instances_val2017.json
│ │ ├── person_keypoints_train2017.json
│ │ └── person_keypoints_val2017.json
│ │
│ ├── train2017/
│ ├── test2017/
│ └── val2017/
│
├── .gitignore
├── app.py
├── requirements.txt
├── training.ipynb
└── vocab.json