Skip to content

An open-source LLM leaderboard that displays the latest public benchmark performance for state-of-the-art open-source model versions.

License

Notifications You must be signed in to change notification settings

iamashok/modelwatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ModelWatch

An open-source LLM leaderboard that displays the latest public benchmark performance for state-of-the-art open-source model versions.

Live Data Sources: llm-stats.com + HuggingFace

ModelWatch Python Next.js

✨ Features

  • πŸ“Š Benchmark Leaderboards: Visual bar charts showing top 5 models for each benchmark (MMLU, ARC, HellaSwag, TruthfulQA)
  • πŸ“‹ Sortable Table View: Sort by pricing (Input $/M, Output $/M), benchmarks, or model parameters
  • πŸ’° Pricing Data: Compare model costs per million tokens (input and output)
  • 🎨 Clean UI: Inspired by Vellum's leaderboard design
  • πŸ”„ Automated Scraping: Uses Playwright to scrape llm-stats.com for latest models
  • πŸ€— HuggingFace Integration: Optional enrichment with HuggingFace model details
  • ⚑ Fast API: Built with FastAPI for quick data serving
  • 🎯 Open Source Focus: Only displays models with open-source licenses

πŸš€ Quick Start

Prerequisites

  • Python 3.9+
  • Node.js 18+
  • Chrome/Chromium (for Playwright scraper)

Option 1: One-Command Setup (Recommended)

# Clone the repository
git clone https://github.com/iamashok/modelwatch.git
cd modelwatch

# Run setup script
./setup.sh

# Collect data
./collect-data.sh

# Start the application
./start.sh

Then visit: http://localhost:3000

Option 2: Manual Setup

1. Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

2. Collect Data

cd scrapers
python llmstats_orchestrator.py

This scrapes llm-stats.com leaderboard and enriches data with HuggingFace details.

3. Start Backend API

cd ../api
python main.py

API will be available at http://localhost:8000

4. Frontend Setup

cd ../../frontend

# Install dependencies
npm install

# Start development server
npm run dev

Frontend will be available at http://localhost:3000

πŸ“ Project Structure

modelwatch/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ scrapers/
β”‚   β”‚   β”œβ”€β”€ llmstats_scraper.py       # Scrapes llm-stats.com with Playwright
β”‚   β”‚   β”œβ”€β”€ llmstats_orchestrator.py  # Main orchestrator
β”‚   β”‚   β”œβ”€β”€ huggingface_scraper.py    # HuggingFace enrichment
β”‚   β”‚   β”œβ”€β”€ simple_orchestrator.py    # Alternative: HF-only scraper
β”‚   β”‚   └── README.md
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── main.py                   # FastAPI backend server
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── schemas.py                # Pydantic data models
β”‚   └── data/
β”‚       └── models.json               # Scraped model data
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ BenchmarkLeaderboard.tsx    # Top 5 charts per benchmark
β”‚   β”‚   β”‚   β”œβ”€β”€ SortableModelsTable.tsx     # Sortable table view
β”‚   β”‚   β”‚   β”œβ”€β”€ HuggingFaceModal.tsx        # Model detail popup
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   └── index.tsx                   # Main page
β”‚   β”‚   └── lib/
β”‚   β”‚       └── api.ts                      # API client
β”‚   β”œβ”€β”€ package.json
β”‚   └── tailwind.config.js
β”œβ”€β”€ setup.sh              # One-command setup script
β”œβ”€β”€ collect-data.sh       # Data collection script
β”œβ”€β”€ start.sh              # Start backend + frontend
└── README.md            # This file

πŸ”§ How It Works

Data Collection Pipeline

llm-stats.com (Playwright Scraper)
         ↓
Extract 30 models with pricing + benchmarks
         ↓
HuggingFace API (Optional Enrichment)
         ↓
Add model details (license, downloads, etc.)
         ↓
Save to models.json
         ↓
FastAPI serves data
         ↓
Next.js displays leaderboards

Data Schema

Each model includes:

{
  "model_id": "THUDM/GLM-4.7",
  "model_name": "GLM-4.7",
  "organization": "THUDM",
  "parameters": "358B",
  "input_price_per_1m": 0.6,
  "output_price_per_1m": 2.2,
  "benchmarks": [
    {"name": "MMLU", "score": 85.7, "category": "knowledge"},
    {"name": "Arc-Challenge", "score": 95.7, "category": "knowledge"},
    {"name": "HellaSwag", "score": 73.8, "category": "general"},
    {"name": "TruthfulQA", "score": 42.8, "category": "knowledge"}
  ],
  "is_open_source": true,
  "hf_url": "https://huggingface.co/THUDM/GLM-4.7",
  "hf_license": "Apache 2.0",
  "hf_downloads": 125000,
  "hf_likes": 450
}

🎯 API Endpoints

Base URL: http://localhost:8000

  • GET /models - Get all models

    • Query params: limit, offset, sort_by, category, min_benchmarks
    • Example: /models?sort_by=likes&limit=20
  • GET /models/{model_id} - Get specific model details

    • Example: /models/THUDM%2FGLM-4.7
  • GET /benchmarks - Get all benchmark types and categories

  • GET /stats - Get overall statistics

    • Returns: total models, benchmark count, last update time
  • GET /docs - Interactive API documentation (Swagger UI)

🎨 UI Features

Charts View (Default)

  • Benchmark Leaderboards: 6 visual bar charts showing top 5 models per benchmark
  • Color-coded Rankings: Gold/Silver/Bronze medals for top 3 performers
  • Interactive Tooltips: Hover to see full model names and exact scores
  • Responsive Grid: Adapts to mobile, tablet, and desktop screens

Table View

  • Sortable Columns: Click any column header to sort
    • Model Name
    • Input $/M (price per million tokens)
    • Output $/M
    • MMLU, ARC, HellaSwag, TruthfulQA scores
    • Parameters (model size)
  • Color-coded Scores: Green (β‰₯80%), Blue (β‰₯60%), Yellow (β‰₯40%)
  • Click for Details: Click any row to open HuggingFace detail modal

HuggingFace Modal

  • View complete model information
  • All benchmarks with color-coded scores
  • HuggingFace stats (downloads, likes, license)
  • Direct link to HuggingFace model page

βš™οΈ Configuration

Scraper Settings

Edit backend/scrapers/llmstats_orchestrator.py:

# Enable/disable HuggingFace enrichment
models = await orchestrator.collect_all_data(enrich_with_hf=True)

# Adjust delay to avoid rate limiting
hf_scraper = HuggingFaceScraper(delay_between_requests=0.5)

# Control concurrent requests
models = await hf_scraper.scrape_models_batch(model_ids, max_concurrent=3)

Alternative: HuggingFace-Only Mode

If llm-stats.com is unavailable, use the HuggingFace-only scraper:

cd backend/scrapers
python simple_orchestrator.py

This fetches trending models directly from HuggingFace API.

Frontend Customization

Colors - Edit frontend/tailwind.config.js:

colors: {
  accent: {
    blue: '#9FC9FF',
    pink: '#FC69D3',
  },
  // ... customize your colors
}

Benchmarks to Display - Edit frontend/src/components/BenchmarkLeaderboard.tsx:

const mainBenchmarks = ['MMLU', 'Arc-Challenge', 'HellaSwag', 'TruthfulQA', 'Winogrande', 'GSM8K'];

πŸ”„ Updating Data

To refresh the leaderboard with latest models:

./collect-data.sh

Or manually:

cd backend/scrapers
source ../venv/bin/activate
python llmstats_orchestrator.py

The API automatically serves updated data (refresh browser to see changes).

πŸ› Troubleshooting

Playwright/Chrome Issues

If scraper fails to launch browser:

# Reinstall Playwright browsers
playwright install chromium

# Or install system Chrome/Chromium
# macOS: brew install chromium
# Ubuntu: sudo apt install chromium-browser

Port Already in Use

Backend (port 8000):

Edit backend/api/main.py:

uvicorn.run("main:app", host="0.0.0.0", port=8001, reload=True)

Frontend (port 3000):

PORT=3001 npm run dev

CORS Errors

Ensure:

  1. Backend is running on http://localhost:8000
  2. Frontend is on http://localhost:3000
  3. Check CORS settings in backend/api/main.py

Rate Limiting (429 Errors)

If HuggingFace scraper gets rate limited:

  1. Increase delay: delay_between_requests=1.0
  2. Reduce concurrency: max_concurrent=2
  3. Or disable enrichment: enrich_with_hf=False

πŸ“¦ Deployment

Backend (FastAPI)

Docker:

FROM python:3.9-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
RUN playwright install chromium
COPY backend/ .
CMD ["python", "api/main.py"]

Railway/Render:

  • Direct Python deployment
  • Set start command: cd backend/api && python main.py
  • Add build command: pip install -r backend/requirements.txt && playwright install chromium

Frontend (Next.js)

Vercel (Recommended):

cd frontend
vercel deploy

Netlify:

# Build command
npm run build

# Publish directory
.next

Static Export:

npm run build
# Deploy the .next folder to any static host

Automated Data Updates

Set up a cron job or GitHub Action:

# .github/workflows/update-data.yml
name: Update Model Data
on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
  workflow_dispatch:

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'

      - name: Install dependencies
        run: |
          cd backend
          pip install -r requirements.txt
          playwright install chromium

      - name: Run scraper
        run: |
          cd backend/scrapers
          python llmstats_orchestrator.py

      - name: Commit updated data
        run: |
          git config --global user.name 'GitHub Action'
          git config --global user.email 'action@github.com'
          git add backend/data/models.json
          git commit -m 'Update model data [skip ci]' || exit 0
          git push

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Add More Benchmarks: Edit benchmark extraction patterns
  2. Improve Scraping: Handle edge cases, new model formats
  3. UI Enhancements: New visualizations, filters, search
  4. Bug Fixes: Report issues or submit fixes
  5. Documentation: Improve setup guides, add examples

Contribution Process:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Test thoroughly
  5. Commit (git commit -m 'Add amazing feature')
  6. Push to branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

πŸ“„ License

MIT License - feel free to use this project for any purpose.

πŸ™ Acknowledgments

πŸ“ž Support


Made with ❀️ for the open-source AI community

About

An open-source LLM leaderboard that displays the latest public benchmark performance for state-of-the-art open-source model versions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •