A multimodal AI chatbot built with Streamlit that lets you interact with PDFs, images, and audio files using large language models via the Groq API.
- 📄 PDF Chat — Upload a PDF, get an AI-generated summary, translate it into 28+ languages, and ask follow-up questions about the content.
- 🖼️ Image Understanding — Upload an image and extract a natural language description using the BLIP image captioning model (Salesforce/blip-image-captioning-large).
- 🎙️ Audio Transcription — Upload an audio file and transcribe it using Groq's Whisper large-v3-turbo model, then chat with the transcript.
- 💬 Conversational Q&A — Ask questions about any uploaded content and get context-aware answers powered by Groq-hosted LLMs.
- 📊 ROUGE Scoring — Automatically evaluates summarization and Q&A quality using ROUGE-1, ROUGE-2, and ROUGE-L metrics.
- 🕒 Response Timing — Displays response and summarization time for each query.
- 🗂️ Conversation History — View, switch between, and manage past chat sessions via the sidebar.
- 🔊 Text-to-Speech — Summaries are converted to audio using gTTS and played back in the app.
| Name | Model ID |
|---|---|
| openai/gpt-oss-120b | openai/gpt-oss-120b |
| Llama 3.1 8b Instant | llama-3.1-8b-instant |
| llama-3.3-70b-versatile | llama-3.3-70b-versatile |
- Streamlit — UI framework
- Groq API — LLM inference (chat + Whisper transcription)
- Hugging Face Transformers — BLIP image captioning
- PyPDF2 — PDF text extraction
- gTTS — Text-to-speech
- rouge-score — Evaluation metrics
- Python 3.9+
- A Groq API key
- A Hugging Face token (for BLIP model access)
git clone https://github.com/KX-ai/Botify.git
cd Botify
pip install -r requirements.txtCreate a .streamlit/secrets.toml file in the project root with the following:
[groq_api]
api_key = "your_groq_api_key_here"
[whisper]
WHISPER_API_KEY = "your_groq_api_key_here"Note: Both keys point to the same Groq API key. The Hugging Face token is currently hardcoded in
app.py— it is recommended to move it tosecrets.tomlas well before deploying.
streamlit run app.py- Select an input method from the dropdown: Upload PDF, Upload Audio, or Upload Image.
- Upload your file using the file uploader.
- For PDFs:
- Choose a language model and output language.
- Click Summarize Text to generate a summary with audio playback and translation.
- Use the chat input at the bottom to ask questions about any uploaded content.
- View past conversations in the sidebar, switch between sessions, or start a new chat.
flac, mp3, mp4, mpeg, mpga, m4a, ogg, opus, wav, webm
28 languages including English, Malay, Chinese, Spanish, French, Arabic, Japanese, Korean, Hindi, and more.
Botify/
├── app.py # Main Streamlit application
└── requirements.txt # Python dependencies
requests==2.32.3
streamlit==1.41.1
PyPDF2
Pillow
gTTS
transformers
torch
pytz
rouge-score
openai
This project is open source. Feel free to fork and build on it.