Skip to content

CLI Reference

Abhishek Gahlot edited this page Mar 27, 2026 · 1 revision

CLI Reference

All commands via deepgym after installation.

deepgym run

Run a single solution.

deepgym run \
  --task "Write a function coin_change(coins, amount)..." \
  --verifier verifier.py \
  --solution solution.py \
  --timeout 30 \
  --snapshot python
Flag Required Default
--task Yes
--verifier Yes
--solution Yes
--timeout No 30
--snapshot No

deepgym run-batch

Run multiple solutions from a directory.

deepgym run-batch \
  --task "Write two_sum..." \
  --verifier verifier.py \
  --solutions-dir ./solutions/ \
  --max-parallel 10 \
  --timeout 30
Flag Required Default
--solutions-dir Yes
--max-parallel No 10

deepgym eval

Evaluate across a suite.

deepgym eval \
  --suite medium \
  --solutions-dir ./solutions/ \
  --max-parallel 100

Suite names: easy, medium, hard, all, coding, computer-use, tool-use, or family names.


deepgym create

Create and register a new environment (outputs JSON).

deepgym create \
  --name my_problem \
  --task "Implement binary search" \
  --verifier verifier.py \
  --difficulty medium \
  --domain coding \
  --tags search algorithm

deepgym audit

Check a verifier for reward hacking vulnerabilities.

deepgym audit \
  --task "Write coin_change..." \
  --verifier verifier.py \
  --benchmark my-benchmark \
  --verifier-id v1 \
  --strategies empty hardcoded trivial overflow pattern llm_attack \
  --persist \
  --db-path ~/.deepgym/exploits.db \
  --json
Flag Required Default
--strategies No all
--persist No false
--db-path No ~/.deepgym/exploits.db
--json No false

See Adversarial Testing for what each strategy does.


deepgym benchmark-audit

Audit benchmark splits for contamination.

deepgym benchmark-audit \
  --env-dir ./environments/ \
  --benchmark mydata \
  --seed 0 \
  --public-eval-ratio 0.2 \
  --holdout-ratio 0.1 \
  --canary-ratio 0.05 \
  --json

deepgym serve

Start the REST API server.

# dev
DEEPGYM_NO_AUTH=true deepgym serve \
  --host 127.0.0.1 \
  --port 8000 \
  --reload \
  --allow-local-exec

# production
DEEPGYM_API_KEY=your-key DAYTONA_API_KEY=your-key \
  deepgym serve --port 8000
Flag Required Default
--host No 127.0.0.1
--port No 8000
--reload No false
--allow-local-exec No false
--no-auth No false

See API Server for endpoint docs.


deepgym web

Web debugging UI.

deepgym web --host 127.0.0.1 --port 8080 --reload --allow-local-exec

Clone this wiki locally