Skip to content

KX-ai/Botify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

219 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Botify

A multimodal AI chatbot built with Streamlit that lets you interact with PDFs, images, and audio files using large language models via the Groq API.


Features

  • 📄 PDF Chat — Upload a PDF, get an AI-generated summary, translate it into 28+ languages, and ask follow-up questions about the content.
  • 🖼️ Image Understanding — Upload an image and extract a natural language description using the BLIP image captioning model (Salesforce/blip-image-captioning-large).
  • 🎙️ Audio Transcription — Upload an audio file and transcribe it using Groq's Whisper large-v3-turbo model, then chat with the transcript.
  • 💬 Conversational Q&A — Ask questions about any uploaded content and get context-aware answers powered by Groq-hosted LLMs.
  • 📊 ROUGE Scoring — Automatically evaluates summarization and Q&A quality using ROUGE-1, ROUGE-2, and ROUGE-L metrics.
  • 🕒 Response Timing — Displays response and summarization time for each query.
  • 🗂️ Conversation History — View, switch between, and manage past chat sessions via the sidebar.
  • 🔊 Text-to-Speech — Summaries are converted to audio using gTTS and played back in the app.

Supported Models

Name Model ID
openai/gpt-oss-120b openai/gpt-oss-120b
Llama 3.1 8b Instant llama-3.1-8b-instant
llama-3.3-70b-versatile llama-3.3-70b-versatile

Tech Stack


Getting Started

Prerequisites

Installation

git clone https://github.com/KX-ai/Botify.git
cd Botify
pip install -r requirements.txt

Configuration

Create a .streamlit/secrets.toml file in the project root with the following:

[groq_api]
api_key = "your_groq_api_key_here"

[whisper]
WHISPER_API_KEY = "your_groq_api_key_here"

Note: Both keys point to the same Groq API key. The Hugging Face token is currently hardcoded in app.py — it is recommended to move it to secrets.toml as well before deploying.

Running the App

streamlit run app.py

Usage

  1. Select an input method from the dropdown: Upload PDF, Upload Audio, or Upload Image.
  2. Upload your file using the file uploader.
  3. For PDFs:
    • Choose a language model and output language.
    • Click Summarize Text to generate a summary with audio playback and translation.
  4. Use the chat input at the bottom to ask questions about any uploaded content.
  5. View past conversations in the sidebar, switch between sessions, or start a new chat.

Supported Audio Formats

flac, mp3, mp4, mpeg, mpga, m4a, ogg, opus, wav, webm

Supported Translation Languages

28 languages including English, Malay, Chinese, Spanish, French, Arabic, Japanese, Korean, Hindi, and more.


Project Structure

Botify/
├── app.py              # Main Streamlit application
└── requirements.txt    # Python dependencies

Dependencies

requests==2.32.3
streamlit==1.41.1
PyPDF2
Pillow
gTTS
transformers
torch
pytz
rouge-score
openai

License

This project is open source. Feel free to fork and build on it.

About

Multimodal AI chatbot supporting PDF, audio & image input with summarization, translation & Q&A — built with Streamlit and Groq API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages