An AI-powered image captioning app built with Streamlit, using ViT-GPT2 for caption generation and YOLOv8 for object detection. The app enhances captions by integrating detected objects into the generated text.
- AI-powered image captioning using ViT-GPT2.
- Object detection with YOLOv8 to enhance captions.
- Dark-themed UI with Streamlit.
- Interactive settings for enabling/disabling object detection.
- Optimized inference with GPU acceleration (CUDA support).
git clone https://github.com/yourusername/AI-Image-Captioning.git
cd AI-Image-Captioningpython -m venv venv
source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windowspip install -r requirements.txtstreamlit run app.py- Pretrained Model:
nlpconnect/vit-gpt2-image-captioning - Task: Generates textual descriptions for input images.
- Pretrained Model:
yolov8n.pt - Task: Detects objects in the image to enhance captions.
AI-Image-Captioning/
│── app.py # Main Streamlit application
│── requirements.txt # Required dependencies
│── README.md # Documentation
│── assest/ # Store images/screenshots- Upload an image in the app.
- Choose whether to enable object detection.
- Click 'Analyze Image' to generate a caption.
- View enhanced captions and object detection results.
- Add multilingual captioning support.
- Optimize object detection performance.
- Implement additional caption refinement techniques.
Contributions are welcome! Feel free to fork this repository and create a pull request with your improvements.


