Skip to content

An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.

Notifications You must be signed in to change notification settings

PrachiPatel15/AI-Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Image-Captioning

An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app enhances captions by integrating detected objects into the generated text.

🔥 Features

  • AI-powered image captioning using ViT-GPT2.
  • Object detection with YOLOv8 to enhance captions.
  • Dark-themed UI with Streamlit.
  • Interactive settings for enabling/disabling object detection.
  • Optimized inference with GPU acceleration (CUDA support).

🚀 Demo

1️⃣ Upload an Image

Upload Screenshot

2️⃣ Enable Object Detection and Generate Captions

Detection Screenshot

3️⃣ View Enhanced Caption and Detected Objects

Results Screenshot

📂 Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/yourusername/AI-Image-Captioning.git
cd AI-Image-Captioning

2️⃣ Create a Virtual Environment (Optional but Recommended)

python -m venv venv
source venv/bin/activate   # On macOS/Linux
venv\Scripts\activate     # On Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Application

streamlit run app.py

🧠 Models Used

1️⃣ ViT-GPT2 (Image Captioning)

  • Pretrained Model: nlpconnect/vit-gpt2-image-captioning
  • Task: Generates textual descriptions for input images.

2️⃣ YOLOv8 (Object Detection)

  • Pretrained Model: yolov8n.pt
  • Task: Detects objects in the image to enhance captions.

⚙️ Project Structure

AI-Image-Captioning/
│── app.py                  # Main Streamlit application
│── requirements.txt        # Required dependencies
│── README.md               # Documentation
│── assest/                 # Store images/screenshots

🛠️ Usage Instructions

  1. Upload an image in the app.
  2. Choose whether to enable object detection.
  3. Click 'Analyze Image' to generate a caption.
  4. View enhanced captions and object detection results.

💡 Future Improvements

  • Add multilingual captioning support.
  • Optimize object detection performance.
  • Implement additional caption refinement techniques.

🤝 Contributing

Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.

About

An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app provides enhanced captions by integrating detected objects into the generated text.

Topics

Resources

Stars

Watchers

Forks

Languages