Skip to content
Abhishek Gahlot edited this page Mar 27, 2026 · 2 revisions

DeepGym Wiki

Reward signals for RL code training. Sandbox it, verify it, score it.

Your model writes code. DeepGym runs it in an isolated sandbox (Daytona or local subprocess), checks it against a verifier, and gives you back a score you can feed straight into your training loop (TRL, verl, OpenRLHF, GRPO/DAPO/PPO).


Pages

Page What's in it
Getting Started Installation, first run, sandbox modes
Core API Reference DeepGym, AsyncDeepGym, data models, exceptions
Environments 24 built-in envs, importable benchmarks, custom environments
Verifier Protocol JSON output spec, writing verifiers, per-test rewards
Integrations TRL, verl, OpenRLHF, lm-eval, HuggingFace Hub
CLI Reference All CLI commands with options and examples
Sandbox Modes Local, Daytona, Auto modes explained
Adversarial Testing Reward hack detection, verifier auditing
API Server REST API, authentication, endpoints
Advanced Usage Gymnasium wrapper, multi-turn, shaped rewards, batch GRPO
Configuration Environment variables, timeouts, directories
Architecture System design, data flow, module map

Links

  • GitHub: DeepGym/deepgym
  • PyPI: pip install deepgym
  • License: MIT
  • Python: >= 3.10

Clone this wiki locally