Skip to content

ML Study Buddy is a Retrieval-Augmented Generation (RAG) system designed to help students master machine learning concepts through intelligent document retrieval, voice interaction, and fine-tuned embeddings.

License

Notifications You must be signed in to change notification settings

aroyy007/ML_Study_Buddy

Repository files navigation

ML Study Buddy - AI-Powered Machine Learning Assistant

A comprehensive Retrieval-Augmented Generation (RAG) system designed to help students master machine learning concepts through intelligent document retrieval, voice interaction, and fine-tuned embeddings.

🌟 Overview

ML Study Buddy is a full-stack application that combines a Next.js frontend with a FastAPI backend to create an intelligent study companion. The system uses advanced RAG techniques with fine-tuned embeddings to provide accurate, contextual answers to machine learning questions.

Key Features

  • πŸ“š Intelligent Document Processing: Supports PDFs, images, and web content with OCR capabilities
  • 🎀 Voice Interaction: Speech-to-text and text-to-speech for hands-free learning
  • 🎯 Fine-Tuned Embeddings: Custom embedding models optimized for ML domain
  • πŸ“Š Performance Evaluation: Comprehensive metrics (Recall@K, MRR, NDCG)
  • 🌐 Modern Web Interface: Responsive React UI with real-time chat
  • πŸš€ Production Ready: Frontend deployed on Vercel

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚    Backend      β”‚    β”‚   AI Services   β”‚
β”‚   (Next.js)     │◄──►│   (FastAPI)     │◄──►│   (Groq/HF)     β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ React UI      β”‚    β”‚ β€’ RAG Pipeline  β”‚    β”‚ β€’ Llama 3.3 70B β”‚
β”‚ β€’ Voice Input   β”‚    β”‚ β€’ Vector Store  β”‚    β”‚ β€’ Whisper STT   β”‚
β”‚ β€’ Chat Interfaceβ”‚    β”‚ β€’ OCR Processingβ”‚    β”‚ β€’ SpeechT5 TTS  β”‚
β”‚ β€’ File Upload   β”‚    β”‚ β€’ Voice Handler β”‚    β”‚ β€’ TrOCR         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š RAG Pipeline

1. Document Ingestion

Raw Documents β†’ Text Extraction β†’ Chunking β†’ Embedding β†’ Vector Store
     ↓              ↓              ↓          ↓           ↓
β€’ PDFs         β€’ PyMuPDF      β€’ 1000 chars β€’ HuggingFace β€’ FAISS
β€’ Images       β€’ TrOCR        β€’ 200 overlap β€’ all-MiniLM  β€’ Index
β€’ Web Pages    β€’ BeautifulSoupβ€’ Recursive   β€’ L6-v2       β€’ Persist

2. Query Processing

User Query β†’ Embedding β†’ Similarity Search β†’ Context Retrieval β†’ LLM Generation
    ↓           ↓            ↓                 ↓               ↓
β€’ Text/Voice β€’ Same Model  β€’ Top-K Results  β€’ Relevant Docs β€’ Groq Llama
β€’ Image OCR  β€’ as Docs     β€’ Cosine Sim     β€’ Metadata      β€’ 3.3 70B

3. Fine-Tuning Pipeline (v2.0)

Documents β†’ Synthetic Queries β†’ Training Pairs β†’ Contrastive Learning β†’ Fine-tuned Model
    ↓            ↓                 ↓               ↓                    ↓
β€’ Chunks    β€’ Groq LLM        β€’ (Q, Doc+, Doc-) β€’ MultipleNegatives  β€’ Better Retrieval
β€’ Metadata  β€’ 3 per doc       β€’ Hard Negatives  β€’ RankingLoss        β€’ +15-25% metrics

πŸ› οΈ Technology Stack

Frontend (Next.js 15)

  • Framework: Next.js 15 with App Router
  • UI: React 19, Tailwind CSS, Radix UI
  • Animations: Framer Motion
  • State: React Hooks, Context API
  • Voice: Web Speech API, MediaRecorder
  • Deployment: Vercel

Backend (Python FastAPI)

  • Framework: FastAPI with Uvicorn
  • RAG: LangChain, FAISS, HuggingFace
  • LLM: Groq (Llama 3.3 70B Versatile)
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • OCR: TrOCR (microsoft/trocr-base-printed)
  • Voice: Whisper (openai/whisper-small), SpeechT5
  • Deployment: Local or cloud hosting

AI Models Used

Component Model Purpose Why This Model
LLM Llama 3.3 70B (Groq) Answer generation Fast inference, high quality, free tier
Embeddings all-MiniLM-L6-v2 Document/query encoding Balanced speed/quality, 384 dimensions
STT Whisper Small Speech transcription Robust multilingual, good accuracy
TTS SpeechT5 Speech synthesis Open source, customizable voice
OCR TrOCR Base Image text extraction Transformer-based, handles printed text

πŸ“ Project Structure

ml-study-buddy/
β”œβ”€β”€ πŸ“ frontend/
β”‚   β”œβ”€β”€ πŸ“ src/
β”‚   β”‚   β”œβ”€β”€ πŸ“ app/                 # Next.js App Router
β”‚   β”‚   β”‚   β”œβ”€β”€ layout.tsx          # Root layout
β”‚   β”‚   β”‚   β”œβ”€β”€ page.tsx            # Landing page
β”‚   β”‚   β”‚   └── chat/               # Chat interface
β”‚   β”‚   β”œβ”€β”€ πŸ“ components/          # React components
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInterface.tsx   # Main chat UI
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInput.tsx       # Input with voice/file
β”‚   β”‚   β”‚   β”œβ”€β”€ MessageBubble.tsx   # Message display
β”‚   β”‚   β”‚   β”œβ”€β”€ VoiceResponseOrb.tsx# Voice visualization
β”‚   β”‚   β”‚   └── ui/                 # Reusable UI components
β”‚   β”‚   β”œβ”€β”€ πŸ“ lib/
β”‚   β”‚   β”‚   β”œβ”€β”€ api.ts              # Backend API calls
β”‚   β”‚   β”‚   └── utils.ts            # Utilities
β”‚   β”‚   └── πŸ“ hooks/               # Custom React hooks
β”‚   β”œβ”€β”€ package.json                # Dependencies
β”‚   └── tailwind.config.js          # Styling config
β”‚
β”œβ”€β”€ πŸ“ backend/
β”‚   β”œβ”€β”€ main.py                     # FastAPI app entry
β”‚   β”œβ”€β”€ config.py                   # Configuration
β”‚   β”œβ”€β”€ πŸ“ rag/
β”‚   β”‚   β”œβ”€β”€ chain.py                # RAG chain logic
β”‚   β”‚   └── vector_store.py         # FAISS management
β”‚   β”œβ”€β”€ πŸ“ voice/
β”‚   β”‚   β”œβ”€β”€ handler.py              # Voice processing
β”‚   β”‚   β”œβ”€β”€ stt.py                  # Speech-to-text
β”‚   β”‚   └── tts.py                  # Text-to-speech
β”‚   β”œβ”€β”€ πŸ“ ocr/
β”‚   β”‚   └── processor.py            # Image OCR
β”‚   β”œβ”€β”€ πŸ“ faiss_index/             # Vector database
β”‚   └── requirements.txt            # Python dependencies
β”‚
└── πŸ“ notebook/
    β”œβ”€β”€ ML_RAG_System_v1_0.ipynb    # Complete RAG system
    └── ML_RAG_System_v2_FineTuned.ipynb # Fine-tuning pipeline

πŸ”„ Data Flow

1. Document Processing Flow

graph TD
    A[Upload Document] --> B{File Type?}
    B -->|PDF| C[PyMuPDF Extract]
    B -->|Image| D[TrOCR Extract]
    B -->|URL| E[Web Scrape]
    C --> F[Text Chunking]
    D --> F
    E --> F
    F --> G[Generate Embeddings]
    G --> H[Store in FAISS]
    H --> I[Update Index]
Loading

2. Query Processing Flow

graph TD
    A[User Query] --> B{Input Type?}
    B -->|Text| C[Direct Processing]
    B -->|Voice| D[Whisper STT]
    B -->|Image| E[TrOCR + Query]
    C --> F[Embed Query]
    D --> F
    E --> F
    F --> G[FAISS Search]
    G --> H[Retrieve Top-K Docs]
    H --> I[Format Context]
    I --> J[Groq LLM]
    J --> K[Generate Response]
    K --> L{Voice Response?}
    L -->|Yes| M[SpeechT5 TTS]
    L -->|No| N[Return Text]
    M --> N
Loading

3. Fine-Tuning Flow (v2.0)

graph TD
    A[Existing Documents] --> B[Generate Synthetic Queries]
    B --> C[Create Training Pairs]
    C --> D[Mine Hard Negatives]
    D --> E[Contrastive Learning]
    E --> F[Fine-tuned Embeddings]
    F --> G[Rebuild FAISS Index]
    G --> H[Evaluate Performance]
    H --> I[Deploy if Better]
Loading

πŸš€ Quick Start

Prerequisites

1. Clone Repository

git clone https://github.com/aroyy007/ml-study-buddy.git
cd ml-study-buddy

2. Backend Setup

cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your GROQ_API_KEY
python run.py

3. Frontend Setup

cd ../
npm install
npm run dev

4. Access Application

πŸ“ˆ Performance Metrics

Baseline vs Fine-Tuned Embeddings

Metric Baseline Fine-tuned Improvement
Recall@5 0.60 0.75-0.82 +15-22%
MRR@5 0.40 0.55-0.65 +15-25%
NDCG@5 0.45 0.60-0.70 +15-25%

Evaluation Metrics Explained

  • Recall@K: Fraction of queries where relevant document appears in top-K results
  • MRR (Mean Reciprocal Rank): Average of 1/rank of first relevant document
  • NDCG@K: Normalized Discounted Cumulative Gain, considers ranking quality

🎯 Fine-Tuning Process

Why Fine-Tune Embeddings?

  1. Domain Adaptation: Generic embeddings may not capture ML-specific relationships
  2. Improved Retrieval: Better semantic understanding of ML concepts
  3. Query-Document Alignment: Learns to match student questions with relevant content

Fine-Tuning Pipeline

  1. Synthetic Data Generation

    • Use Groq LLM to generate 3 queries per document chunk
    • Create (query, positive_document) pairs
    • Generate ~1500 training examples
  2. Contrastive Learning

    • MultipleNegativesRankingLoss (InfoNCE)
    • In-batch negatives for efficiency
    • Hard negative mining from existing index
  3. Model Training

    • Base: sentence-transformers/all-MiniLM-L6-v2
    • 3 epochs, batch size 16, learning rate 2e-5
    • Warmup steps: 100
  4. Evaluation & Deployment

    • Compare metrics on held-out test set
    • Rebuild FAISS index with fine-tuned embeddings
    • A/B test in production

πŸ”§ API Endpoints

Core Endpoints

  • GET /health - System health check
  • POST /query - Text-based RAG query
  • POST /transcribe - Audio transcription
  • POST /voice-query - Voice-based RAG query
  • POST /upload - Document upload
  • DELETE /session/{id} - Clear chat session

Request/Response Examples

Text Query

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is gradient descent?", "session_id": "user123"}'

Voice Query

curl -X POST "http://localhost:8000/voice-query" \
  -F "audio=@recording.wav" \
  -F "session_id=voice123" \
  -F "generate_audio=true"

🌐 Deployment

Backend (Hugging Face Spaces)

  1. Create Space at huggingface.co/spaces

    • Choose Docker SDK
    • Hardware: CPU Basic (free) or GPU for faster inference
  2. Upload Files:

    app.py                 # FastAPI entry point
    Dockerfile             # Docker configuration
    requirements-hf.txt    # Python dependencies
    README.md              # Readme for HF Space
    backend/               # Backend modules
    faiss_index/           # Pre-built vector index
    
  3. Set Secrets (Settings β†’ Repository secrets):

    • GROQ_API_KEY - Your Groq API key
  4. Your API URL: https://YOUR-USERNAME-your-space-name.hf.space

Backend (Local Development)

cd backend
pip install -r requirements.txt
python run.py

Frontend (Vercel)

# Deploy to Vercel
vercel --prod
# Set environment variable to your HF Space URL
vercel env add NEXT_PUBLIC_API_URL

πŸ§ͺ Development & Testing

Running Jupyter Notebooks

# Install Jupyter
pip install jupyter

# Run v1.0 (Complete RAG System)
jupyter notebook notebook/ML_RAG_System_v1_0.ipynb

# Run v2.0 (Fine-Tuning Pipeline)
jupyter notebook notebook/ML_RAG_System_v2_FineTuned.ipynb

Testing the System

  1. Health Check: Verify backend is running
  2. Document Upload: Test PDF/image processing
  3. Text Queries: Test RAG responses
  4. Voice Features: Test STT/TTS pipeline
  5. Fine-Tuning: Run evaluation metrics

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Groq for fast LLM inference
  • HuggingFace for open-source models
  • LangChain for RAG framework
  • FAISS for efficient vector search
  • Next.js and FastAPI for modern web development

About

ML Study Buddy is a Retrieval-Augmented Generation (RAG) system designed to help students master machine learning concepts through intelligent document retrieval, voice interaction, and fine-tuned embeddings.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •