Docker

Refer to the Docker setup guide for instructions on installing Docker on your machine

Evaluation Quick Start

Create a new empty folder, add two files in this folder:

./config.json5
./docker-compose.yml

For config.json5, copy the json below and edit by Config Parameters:

{
  "models": [
    "openai/gpt-4o", 
    // You can add more models here
    // "claude-sonnet-4-20250514"
  ],
  // Eval one project only
  // "projects": ["@web-bench/react"]
}

For docker-compose.yml, copy the yaml below and set environment

services:
  web-bench:
    image: maoyiweiebay777/web-bench:latest
    volumes:
      - ./config.json5:/app/apps/eval/src/config.json5
      - ./report:/app/apps/eval/report
    environment:
      # Add enviorment variables according to apps/src/model.json
      - OPENROUTER_API_KEY=your_api_key
      # Add more model's key
      # - ANTHROPIC_API_KEY=your_api_key

Run docker-compose:

docker compose up

Evaluation Report will be generated under ./report/

Note

The current mode only supports evaluation, not development.

Evaluation | arXiv Paper | Leaderboard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker

Evaluation Quick Start

Note

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally