An open-source LLM leaderboard that displays the latest public benchmark performance for state-of-the-art open-source model versions.
Live Data Sources: llm-stats.com + HuggingFace
- π Benchmark Leaderboards: Visual bar charts showing top 5 models for each benchmark (MMLU, ARC, HellaSwag, TruthfulQA)
- π Sortable Table View: Sort by pricing (Input
$/M, Output $ /M), benchmarks, or model parameters - π° Pricing Data: Compare model costs per million tokens (input and output)
- π¨ Clean UI: Inspired by Vellum's leaderboard design
- π Automated Scraping: Uses Playwright to scrape llm-stats.com for latest models
- π€ HuggingFace Integration: Optional enrichment with HuggingFace model details
- β‘ Fast API: Built with FastAPI for quick data serving
- π― Open Source Focus: Only displays models with open-source licenses
- Python 3.9+
- Node.js 18+
- Chrome/Chromium (for Playwright scraper)
# Clone the repository
git clone https://github.com/iamashok/modelwatch.git
cd modelwatch
# Run setup script
./setup.sh
# Collect data
./collect-data.sh
# Start the application
./start.shThen visit: http://localhost:3000
1. Backend Setup
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium2. Collect Data
cd scrapers
python llmstats_orchestrator.pyThis scrapes llm-stats.com leaderboard and enriches data with HuggingFace details.
3. Start Backend API
cd ../api
python main.pyAPI will be available at http://localhost:8000
4. Frontend Setup
cd ../../frontend
# Install dependencies
npm install
# Start development server
npm run devFrontend will be available at http://localhost:3000
modelwatch/
βββ backend/
β βββ scrapers/
β β βββ llmstats_scraper.py # Scrapes llm-stats.com with Playwright
β β βββ llmstats_orchestrator.py # Main orchestrator
β β βββ huggingface_scraper.py # HuggingFace enrichment
β β βββ simple_orchestrator.py # Alternative: HF-only scraper
β β βββ README.md
β βββ api/
β β βββ main.py # FastAPI backend server
β βββ models/
β β βββ schemas.py # Pydantic data models
β βββ data/
β βββ models.json # Scraped model data
βββ frontend/
β βββ src/
β β βββ components/
β β β βββ BenchmarkLeaderboard.tsx # Top 5 charts per benchmark
β β β βββ SortableModelsTable.tsx # Sortable table view
β β β βββ HuggingFaceModal.tsx # Model detail popup
β β β βββ ...
β β βββ pages/
β β β βββ index.tsx # Main page
β β βββ lib/
β β βββ api.ts # API client
β βββ package.json
β βββ tailwind.config.js
βββ setup.sh # One-command setup script
βββ collect-data.sh # Data collection script
βββ start.sh # Start backend + frontend
βββ README.md # This file
llm-stats.com (Playwright Scraper)
β
Extract 30 models with pricing + benchmarks
β
HuggingFace API (Optional Enrichment)
β
Add model details (license, downloads, etc.)
β
Save to models.json
β
FastAPI serves data
β
Next.js displays leaderboards
Each model includes:
{
"model_id": "THUDM/GLM-4.7",
"model_name": "GLM-4.7",
"organization": "THUDM",
"parameters": "358B",
"input_price_per_1m": 0.6,
"output_price_per_1m": 2.2,
"benchmarks": [
{"name": "MMLU", "score": 85.7, "category": "knowledge"},
{"name": "Arc-Challenge", "score": 95.7, "category": "knowledge"},
{"name": "HellaSwag", "score": 73.8, "category": "general"},
{"name": "TruthfulQA", "score": 42.8, "category": "knowledge"}
],
"is_open_source": true,
"hf_url": "https://huggingface.co/THUDM/GLM-4.7",
"hf_license": "Apache 2.0",
"hf_downloads": 125000,
"hf_likes": 450
}Base URL: http://localhost:8000
-
GET
/models- Get all models- Query params:
limit,offset,sort_by,category,min_benchmarks - Example:
/models?sort_by=likes&limit=20
- Query params:
-
GET
/models/{model_id}- Get specific model details- Example:
/models/THUDM%2FGLM-4.7
- Example:
-
GET
/benchmarks- Get all benchmark types and categories -
GET
/stats- Get overall statistics- Returns: total models, benchmark count, last update time
-
GET
/docs- Interactive API documentation (Swagger UI)
- Benchmark Leaderboards: 6 visual bar charts showing top 5 models per benchmark
- Color-coded Rankings: Gold/Silver/Bronze medals for top 3 performers
- Interactive Tooltips: Hover to see full model names and exact scores
- Responsive Grid: Adapts to mobile, tablet, and desktop screens
-
Sortable Columns: Click any column header to sort
- Model Name
- Input $/M (price per million tokens)
- Output $/M
- MMLU, ARC, HellaSwag, TruthfulQA scores
- Parameters (model size)
- Color-coded Scores: Green (β₯80%), Blue (β₯60%), Yellow (β₯40%)
- Click for Details: Click any row to open HuggingFace detail modal
- View complete model information
- All benchmarks with color-coded scores
- HuggingFace stats (downloads, likes, license)
- Direct link to HuggingFace model page
Edit backend/scrapers/llmstats_orchestrator.py:
# Enable/disable HuggingFace enrichment
models = await orchestrator.collect_all_data(enrich_with_hf=True)
# Adjust delay to avoid rate limiting
hf_scraper = HuggingFaceScraper(delay_between_requests=0.5)
# Control concurrent requests
models = await hf_scraper.scrape_models_batch(model_ids, max_concurrent=3)If llm-stats.com is unavailable, use the HuggingFace-only scraper:
cd backend/scrapers
python simple_orchestrator.pyThis fetches trending models directly from HuggingFace API.
Colors - Edit frontend/tailwind.config.js:
colors: {
accent: {
blue: '#9FC9FF',
pink: '#FC69D3',
},
// ... customize your colors
}Benchmarks to Display - Edit frontend/src/components/BenchmarkLeaderboard.tsx:
const mainBenchmarks = ['MMLU', 'Arc-Challenge', 'HellaSwag', 'TruthfulQA', 'Winogrande', 'GSM8K'];To refresh the leaderboard with latest models:
./collect-data.shOr manually:
cd backend/scrapers
source ../venv/bin/activate
python llmstats_orchestrator.pyThe API automatically serves updated data (refresh browser to see changes).
If scraper fails to launch browser:
# Reinstall Playwright browsers
playwright install chromium
# Or install system Chrome/Chromium
# macOS: brew install chromium
# Ubuntu: sudo apt install chromium-browserBackend (port 8000):
Edit backend/api/main.py:
uvicorn.run("main:app", host="0.0.0.0", port=8001, reload=True)Frontend (port 3000):
PORT=3001 npm run devEnsure:
- Backend is running on
http://localhost:8000 - Frontend is on
http://localhost:3000 - Check CORS settings in
backend/api/main.py
If HuggingFace scraper gets rate limited:
- Increase delay:
delay_between_requests=1.0 - Reduce concurrency:
max_concurrent=2 - Or disable enrichment:
enrich_with_hf=False
Docker:
FROM python:3.9-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
RUN playwright install chromium
COPY backend/ .
CMD ["python", "api/main.py"]Railway/Render:
- Direct Python deployment
- Set start command:
cd backend/api && python main.py - Add build command:
pip install -r backend/requirements.txt && playwright install chromium
Vercel (Recommended):
cd frontend
vercel deployNetlify:
# Build command
npm run build
# Publish directory
.nextStatic Export:
npm run build
# Deploy the .next folder to any static hostSet up a cron job or GitHub Action:
# .github/workflows/update-data.yml
name: Update Model Data
on:
schedule:
- cron: '0 0 * * *' # Daily at midnight
workflow_dispatch:
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
cd backend
pip install -r requirements.txt
playwright install chromium
- name: Run scraper
run: |
cd backend/scrapers
python llmstats_orchestrator.py
- name: Commit updated data
run: |
git config --global user.name 'GitHub Action'
git config --global user.email 'action@github.com'
git add backend/data/models.json
git commit -m 'Update model data [skip ci]' || exit 0
git pushContributions are welcome! Here's how you can help:
- Add More Benchmarks: Edit benchmark extraction patterns
- Improve Scraping: Handle edge cases, new model formats
- UI Enhancements: New visualizations, filters, search
- Bug Fixes: Report issues or submit fixes
- Documentation: Improve setup guides, add examples
Contribution Process:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Test thoroughly
- Commit (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - feel free to use this project for any purpose.
- Data Sources:
- llm-stats.com - Primary benchmark data
- HuggingFace - Model details and enrichment
- Built With:
- FastAPI - Backend framework
- Next.js - Frontend framework
- Playwright - Web scraping
- Recharts - Data visualization
- Tailwind CSS - Styling
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with β€οΈ for the open-source AI community