Skip to content

This stores repos teaches you something about data science.

License

Notifications You must be signed in to change notification settings

SankaiAI/data-learning-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Scientist Projects

A collection of hands-on data science and ML engineering projects focused on practical, production-ready skills.

🗂️ Projects Overview

Project Description Tech Stack
CICD-MLflow Enterprise CI/CD simulation for ML pipelines with real MLflow tracking Docker, FastAPI, Next.js, MLflow, PostgreSQL, MinIO
ab-test-cuped-did-lab Interactive A/B testing learning lab with CUPED and DiD methods React, TypeScript, Vite, Recharts
ab-test-stakeholder-chat Animated stakeholder group chat simulating a full end-to-end A/B test process Vanilla HTML/CSS/JS, Vite

📦 CICD-MLflow

An enterprise CI/CD simulation for medical claim ML pipelines with real MLflow tracking. Learn the complete lifecycle of ML engineering in an enterprise environment.

Key Features

  • Interactive Pipeline DAG with real-time status updates
  • Real MLflow integration for experiment tracking
  • Champion vs Challenger model promotion logic
  • Drift monitoring and rollback capabilities

Quick Start

cd CICD-MLflow

# Setup environment (optional - .env is already included)
cp .env.example .env

# Start all services with Docker
docker compose up --build

Access Points

Prerequisites

  • Docker Desktop (with Docker Compose)
  • At least 4GB RAM available for Docker
  • Ports available: 3000, 5000, 8000, 9000, 9001, 5432

📖 Full Documentation →


📊 ab-test-cuped-did-lab

An interactive web application that teaches A/B testing concepts through real-time simulated streaming events. Learn how CUPED reduces variance and Difference-in-Differences (DiD) removes confounding time effects.

Key Features

  • Multi-language support (English / 中文)
  • Sample size calculator for pre-experiment planning
  • Real-time streaming simulation of user events
  • Step-by-step walkthroughs of CUPED and DiD calculations

Quick Start

cd ab-test-cuped-did-lab

# Install dependencies
npm install

# Start development server
npm run dev

# Open http://localhost:5173

Prerequisites

  • Node.js (v18+ recommended)
  • npm or yarn

📖 Full Documentation →


💬 ab-test-stakeholder-chat

An animated group chat simulation that walks through a complete, real-world A/B test — from idea to ship decision — as a conversation between 6 company stakeholders (PM, Data Scientist, Engineer, Designer, Growth Lead, Legal).

Key Features

  • Animated Slack-style group chat playing out all 4 phases of an A/B test
  • Live deliverables panel tracking every document shared during the process
  • Replay, pause, speed controls (1×/2×/5×), and jump-to-end
  • Plain-English 8-step daily workflow guide modal
  • 🤖 AI Chatbot — ask any question and the right stakeholder answers in character (powered by Gemini)

Quick Start

cd ab-test-stakeholder-chat

# Install dependencies
npm install

# Start development server
npm run dev

# Open http://localhost:5173

Prerequisites

  • Node.js (v18+ recommended)
  • npm or yarn
  • A free Gemini API key — add it to ab-test-stakeholder-chat/.env as VITE_GEMINI_API_KEY=your_key

📖 Full Documentation →


🚀 Getting Started

Clone the Repository

git clone https://github.com/YOUR_USERNAME/Data-scientist-projects.git
cd Data-scientist-projects

Choose a Project

Navigate to the project directory you want to explore and follow the Quick Start instructions above.


📚 Learning Path

If you're new to these topics, here's a suggested learning order:

  1. Start with ab-test-stakeholder-chat - Watch how a real team collaborates to design, launch, and analyze an A/B test
  2. Deep dive with ab-test-cuped-did-lab - Learn the statistical methods hands-on: CUPED, DiD, and real-time streaming simulation
  3. Continue with CICD-MLflow - Understand enterprise ML pipelines, experiment tracking, and model deployment workflows

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

📄 License

MIT License - See individual project LICENSE files for details.

About

This stores repos teaches you something about data science.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published