A comprehensive Retrieval-Augmented Generation (RAG) system designed to help students master machine learning concepts through intelligent document retrieval, voice interaction, and fine-tuned embeddings.
ML Study Buddy is a full-stack application that combines a Next.js frontend with a FastAPI backend to create an intelligent study companion. The system uses advanced RAG techniques with fine-tuned embeddings to provide accurate, contextual answers to machine learning questions.
- π Intelligent Document Processing: Supports PDFs, images, and web content with OCR capabilities
- π€ Voice Interaction: Speech-to-text and text-to-speech for hands-free learning
- π― Fine-Tuned Embeddings: Custom embedding models optimized for ML domain
- π Performance Evaluation: Comprehensive metrics (Recall@K, MRR, NDCG)
- π Modern Web Interface: Responsive React UI with real-time chat
- π Production Ready: Frontend deployed on Vercel
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β AI Services β
β (Next.js) βββββΊβ (FastAPI) βββββΊβ (Groq/HF) β
β β β β β β
β β’ React UI β β β’ RAG Pipeline β β β’ Llama 3.3 70B β
β β’ Voice Input β β β’ Vector Store β β β’ Whisper STT β
β β’ Chat Interfaceβ β β’ OCR Processingβ β β’ SpeechT5 TTS β
β β’ File Upload β β β’ Voice Handler β β β’ TrOCR β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Raw Documents β Text Extraction β Chunking β Embedding β Vector Store
β β β β β
β’ PDFs β’ PyMuPDF β’ 1000 chars β’ HuggingFace β’ FAISS
β’ Images β’ TrOCR β’ 200 overlap β’ all-MiniLM β’ Index
β’ Web Pages β’ BeautifulSoupβ’ Recursive β’ L6-v2 β’ Persist
User Query β Embedding β Similarity Search β Context Retrieval β LLM Generation
β β β β β
β’ Text/Voice β’ Same Model β’ Top-K Results β’ Relevant Docs β’ Groq Llama
β’ Image OCR β’ as Docs β’ Cosine Sim β’ Metadata β’ 3.3 70B
Documents β Synthetic Queries β Training Pairs β Contrastive Learning β Fine-tuned Model
β β β β β
β’ Chunks β’ Groq LLM β’ (Q, Doc+, Doc-) β’ MultipleNegatives β’ Better Retrieval
β’ Metadata β’ 3 per doc β’ Hard Negatives β’ RankingLoss β’ +15-25% metrics
- Framework: Next.js 15 with App Router
- UI: React 19, Tailwind CSS, Radix UI
- Animations: Framer Motion
- State: React Hooks, Context API
- Voice: Web Speech API, MediaRecorder
- Deployment: Vercel
- Framework: FastAPI with Uvicorn
- RAG: LangChain, FAISS, HuggingFace
- LLM: Groq (Llama 3.3 70B Versatile)
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- OCR: TrOCR (microsoft/trocr-base-printed)
- Voice: Whisper (openai/whisper-small), SpeechT5
- Deployment: Local or cloud hosting
| Component | Model | Purpose | Why This Model |
|---|---|---|---|
| LLM | Llama 3.3 70B (Groq) | Answer generation | Fast inference, high quality, free tier |
| Embeddings | all-MiniLM-L6-v2 | Document/query encoding | Balanced speed/quality, 384 dimensions |
| STT | Whisper Small | Speech transcription | Robust multilingual, good accuracy |
| TTS | SpeechT5 | Speech synthesis | Open source, customizable voice |
| OCR | TrOCR Base | Image text extraction | Transformer-based, handles printed text |
ml-study-buddy/
βββ π frontend/
β βββ π src/
β β βββ π app/ # Next.js App Router
β β β βββ layout.tsx # Root layout
β β β βββ page.tsx # Landing page
β β β βββ chat/ # Chat interface
β β βββ π components/ # React components
β β β βββ ChatInterface.tsx # Main chat UI
β β β βββ ChatInput.tsx # Input with voice/file
β β β βββ MessageBubble.tsx # Message display
β β β βββ VoiceResponseOrb.tsx# Voice visualization
β β β βββ ui/ # Reusable UI components
β β βββ π lib/
β β β βββ api.ts # Backend API calls
β β β βββ utils.ts # Utilities
β β βββ π hooks/ # Custom React hooks
β βββ package.json # Dependencies
β βββ tailwind.config.js # Styling config
β
βββ π backend/
β βββ main.py # FastAPI app entry
β βββ config.py # Configuration
β βββ π rag/
β β βββ chain.py # RAG chain logic
β β βββ vector_store.py # FAISS management
β βββ π voice/
β β βββ handler.py # Voice processing
β β βββ stt.py # Speech-to-text
β β βββ tts.py # Text-to-speech
β βββ π ocr/
β β βββ processor.py # Image OCR
β βββ π faiss_index/ # Vector database
β βββ requirements.txt # Python dependencies
β
βββ π notebook/
βββ ML_RAG_System_v1_0.ipynb # Complete RAG system
βββ ML_RAG_System_v2_FineTuned.ipynb # Fine-tuning pipeline
graph TD
A[Upload Document] --> B{File Type?}
B -->|PDF| C[PyMuPDF Extract]
B -->|Image| D[TrOCR Extract]
B -->|URL| E[Web Scrape]
C --> F[Text Chunking]
D --> F
E --> F
F --> G[Generate Embeddings]
G --> H[Store in FAISS]
H --> I[Update Index]
graph TD
A[User Query] --> B{Input Type?}
B -->|Text| C[Direct Processing]
B -->|Voice| D[Whisper STT]
B -->|Image| E[TrOCR + Query]
C --> F[Embed Query]
D --> F
E --> F
F --> G[FAISS Search]
G --> H[Retrieve Top-K Docs]
H --> I[Format Context]
I --> J[Groq LLM]
J --> K[Generate Response]
K --> L{Voice Response?}
L -->|Yes| M[SpeechT5 TTS]
L -->|No| N[Return Text]
M --> N
graph TD
A[Existing Documents] --> B[Generate Synthetic Queries]
B --> C[Create Training Pairs]
C --> D[Mine Hard Negatives]
D --> E[Contrastive Learning]
E --> F[Fine-tuned Embeddings]
F --> G[Rebuild FAISS Index]
G --> H[Evaluate Performance]
H --> I[Deploy if Better]
- Node.js 18+
- Python 3.9+
- Groq API Key (free at console.groq.com)
git clone https://github.com/aroyy007/ml-study-buddy.git
cd ml-study-buddycd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your GROQ_API_KEY
python run.pycd ../
npm install
npm run dev- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
| Metric | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| Recall@5 | 0.60 | 0.75-0.82 | +15-22% |
| MRR@5 | 0.40 | 0.55-0.65 | +15-25% |
| NDCG@5 | 0.45 | 0.60-0.70 | +15-25% |
- Recall@K: Fraction of queries where relevant document appears in top-K results
- MRR (Mean Reciprocal Rank): Average of 1/rank of first relevant document
- NDCG@K: Normalized Discounted Cumulative Gain, considers ranking quality
- Domain Adaptation: Generic embeddings may not capture ML-specific relationships
- Improved Retrieval: Better semantic understanding of ML concepts
- Query-Document Alignment: Learns to match student questions with relevant content
-
Synthetic Data Generation
- Use Groq LLM to generate 3 queries per document chunk
- Create (query, positive_document) pairs
- Generate ~1500 training examples
-
Contrastive Learning
- MultipleNegativesRankingLoss (InfoNCE)
- In-batch negatives for efficiency
- Hard negative mining from existing index
-
Model Training
- Base: sentence-transformers/all-MiniLM-L6-v2
- 3 epochs, batch size 16, learning rate 2e-5
- Warmup steps: 100
-
Evaluation & Deployment
- Compare metrics on held-out test set
- Rebuild FAISS index with fine-tuned embeddings
- A/B test in production
GET /health- System health checkPOST /query- Text-based RAG queryPOST /transcribe- Audio transcriptionPOST /voice-query- Voice-based RAG queryPOST /upload- Document uploadDELETE /session/{id}- Clear chat session
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is gradient descent?", "session_id": "user123"}'curl -X POST "http://localhost:8000/voice-query" \
-F "audio=@recording.wav" \
-F "session_id=voice123" \
-F "generate_audio=true"-
Create Space at huggingface.co/spaces
- Choose Docker SDK
- Hardware: CPU Basic (free) or GPU for faster inference
-
Upload Files:
app.py # FastAPI entry point Dockerfile # Docker configuration requirements-hf.txt # Python dependencies README.md # Readme for HF Space backend/ # Backend modules faiss_index/ # Pre-built vector index -
Set Secrets (Settings β Repository secrets):
GROQ_API_KEY- Your Groq API key
-
Your API URL:
https://YOUR-USERNAME-your-space-name.hf.space
cd backend
pip install -r requirements.txt
python run.py# Deploy to Vercel
vercel --prod
# Set environment variable to your HF Space URL
vercel env add NEXT_PUBLIC_API_URL# Install Jupyter
pip install jupyter
# Run v1.0 (Complete RAG System)
jupyter notebook notebook/ML_RAG_System_v1_0.ipynb
# Run v2.0 (Fine-Tuning Pipeline)
jupyter notebook notebook/ML_RAG_System_v2_FineTuned.ipynb- Health Check: Verify backend is running
- Document Upload: Test PDF/image processing
- Text Queries: Test RAG responses
- Voice Features: Test STT/TTS pipeline
- Fine-Tuning: Run evaluation metrics
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Groq for fast LLM inference
- HuggingFace for open-source models
- LangChain for RAG framework
- FAISS for efficient vector search
- Next.js and FastAPI for modern web development