Your AI Data Science Tutor
ML TutorBot is a multilingual, AI-powered Data Science & Machine Learning tutor. It helps users understand ML/DS concepts, libraries, and techniques in a conversational way, using Retrieval-Augmented Generation (RAG) to provide accurate, contextual answers from curated knowledge sources.
---Next steps:
- Add the initial structure ✅
- Add the agent workflow ✅
- Add the API logic and user interaction ✅
- Make a simple frontend to improve the usabillity ✅
- Add the language detector and the translator agent 🔁
- Deploy it in Hostinger or some platform like that
-
Answer ML & Data Science questions in multiple languages
-
Uses RAG to retrieve context from official documentation, tutorials, and open-access books
-
Provides clear explanations, code snippets, and examples
-
Modular agent architecture:
-
Language Detector → detects user query language
-
Retriever Agent → fetches relevant chunks from knowledge base
-
Answering Agent → generates concise answers
-
Translator Agent → ensures responses match user language
flowchart TD
A[User in Gradio Frontend] -->|Sends question/request| B[API - api.py using FastAPI]
B --> C[Agent Workflow]
C -->|Analyzes input intent| D{Select Tool}
D -->|General ML or DS question| E[RAG Pipeline]
D -->|Requires code execution| F[Code Interpreter]
E --> G[Retrieve from Chroma DB]
E --> Z[Scrapp the data from notable source]
Z --> X[Store the scrapped in the Vectorstore]
X --> G[Retriever Agent fetches context from Chroma DB]
G --> J[Provide all the information to the final answering agent]
F --> K[Execute Python logic safely - sandboxed]
K --> L[Return computed result or code output]
L --> J
J --> M[Translator Agent matches user language]
M --> N[API sends response to Frontend]
N -->|Displays explanation or result| O[User sees response in Frontend]
ML-TutorBot/
├── src/
│ ├── app/
│ │ ├── agent_workflow/ # Handles agent orchestration (Language, Retriever, Answering, Translator)
│ │ ├── api/ # FastAPI routes and API logic
│ │ ├── core/ # Core utilities, configs, and constants
│ │ ├── frontend/ # Gradio UI components and design
│ │ ├── rag_pipelines/ # RAG (Retrieval-Augmented Generation) logic and document retrieval flow
│ │ └── __init__.py
│ │
│ ├── data/ # Preprocessed documents and text datasets for embeddings
│ ├── chroma/ # Vector database storage (Chroma persistence)
│ ├── tests/ # Unit and integration tests
│ ├── __init__.py
│ └── main.py # Entry point for backend execution
│
├── docker-compose.yml # Docker multi-service setup (backend, vector DB, etc.)
├── Dockerfile # Container definition for ML TutorBot
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── LICENSE # License file-
Official Documentation: scikit-learn, Pandas, NumPy, PyTorch, TensorFlow
-
Open-Access Books: Dive into Deep Learning, fast.ai courses
-
Blogs & Tutorials: Kaggle Learn, Towards Data Science, Analytics Vidhya
-
Wikipedia (ML/DS articles)
-
All documents are chunked and embedded into a vector database for RAG.
Language Model: Gemini 2.5 Flash Lite
Embeddings: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Vector Database: Chroma
Backend: FastAPI
Interface: Gradio
- Clone the repo:
git clone https://github.com/Fugant1/ML-TutorBot.git
cd ML-TutorBot- Run:
docker compose build
docker compose up
python3 -m app.frontend.ui
