A tool that automatically generates compelling Instagram captions with relevant hashtags for your images.
This project provides two different approaches to generate Instagram captions from images:
- Version 1 (Two-Stage Pipeline): First generates an image description locally using the BLIP model, then uses an LLM to create a caption based on that description.
- Version 2 (Direct Vision-Language Approach): Sends the image directly to a multimodal LLM (Gemini) to generate captions in a single step.
- Upload any image and get a professional Instagram caption
- Add relevant hashtags automatically
- Simple user interface built with Gradio
- Two different processing approaches to choose from based on your needs
- Python 3.12
- Gradio: For creating the web interface
- Transformers: Using Salesforce's BLIP model for image captioning
- LiteLLM: For unified access to various LLM APIs
- Gemini 1.5 Flash: Google's multimodal LLM for image understanding
- OpenCV & PIL: For image processing
- PyTorch: Backend for the BLIP transformer model
- Python-dotenv: For environment variable management
-
Clone this repository
-
Install dependencies:
uv syncOr use the providedpyproject.tomlwith a tool like Poetry or PDM. -
Create a
.envfile based on the providedexample.envfile.
This version:
- Generates image descriptions locally
- Sends text descriptions to LLM
- More cost-effective but potentially less accurate
- Shows both the image description and the final caption
python app.pyThis version:
- Sends images directly to a multimodal LLM
- More accurate but potentially higher API costs
- Only shows the final caption
python main.py- Uses the Salesforce BLIP model to generate a text description of the image
- Sends this description to Gemini with a prompt to create an Instagram caption
- Returns both the original description and the generated caption
- Encodes the image to base64
- Sends the image directly to Gemini with a prompt
- Returns the generated caption
src/caption.py: Handles image captioning with the BLIP modelsrc/llm.py: Contains functions for interacting with LLMssrc/ui.py: UI implementation (development version)app.py: Main application entry point for Version 1__main__.py: Entry point for Version 2
See pyproject.toml for the complete list of dependencies.
[Specify license here]
Contributions are welcome! Please feel free to submit a Pull Request.

