-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Abhishek Gahlot edited this page Mar 27, 2026
·
2 revisions
Reward signals for RL code training. Sandbox it, verify it, score it.
Your model writes code. DeepGym runs it in an isolated sandbox (Daytona or local subprocess), checks it against a verifier, and gives you back a score you can feed straight into your training loop (TRL, verl, OpenRLHF, GRPO/DAPO/PPO).
| Page | What's in it |
|---|---|
| Getting Started | Installation, first run, sandbox modes |
| Core API Reference |
DeepGym, AsyncDeepGym, data models, exceptions |
| Environments | 24 built-in envs, importable benchmarks, custom environments |
| Verifier Protocol | JSON output spec, writing verifiers, per-test rewards |
| Integrations | TRL, verl, OpenRLHF, lm-eval, HuggingFace Hub |
| CLI Reference | All CLI commands with options and examples |
| Sandbox Modes | Local, Daytona, Auto modes explained |
| Adversarial Testing | Reward hack detection, verifier auditing |
| API Server | REST API, authentication, endpoints |
| Advanced Usage | Gymnasium wrapper, multi-turn, shaped rewards, batch GRPO |
| Configuration | Environment variables, timeouts, directories |
| Architecture | System design, data flow, module map |
- GitHub: DeepGym/deepgym
-
PyPI:
pip install deepgym - License: MIT
- Python: >= 3.10