Core API Reference

DeepGym (sync client)

Module: deepgym.core

from deepgym import DeepGym

Constructor

DeepGym(
    api_key: str | None = None,
    mode: Literal['auto', 'daytona', 'local'] = 'auto',
)

Parameter	Default	Description
`api_key`	`None`	Daytona API key (falls back to `DAYTONA_API_KEY` env var)
`mode`	`'auto'`	Execution mode: `auto`, `daytona`, or `local`

Methods

`run(env, model_output) -> RunResult`

Run a single solution against an environment.

result = dg.run(env, model_output='def two_sum(nums, target): ...')

Raises: VerifierError, SandboxError, TimeoutError

`run_batch(env, outputs, max_parallel) -> BatchResult`

Run multiple solutions in parallel.

batch = dg.run_batch(env, solutions, max_parallel=8)
print(f'{batch.passed}/{batch.total} passed, avg score: {batch.avg_score:.2f}')

Parameter	Type	Default
`env`	`Environment`
`outputs`	`Sequence[str]`
`max_parallel`	`int`	`10`

`eval(suite, model_outputs, max_parallel) -> EvalResult`

Evaluate across a whole suite of environments.

result = dg.eval('medium', {'coin_change': code1, 'two_sum': code2}, max_parallel=100)
print(f'Pass rate: {result.pass_rate:.1%}')

Suite names: easy, medium, hard, all, or family names like dynamic-programming.

AsyncDeepGym (async client)

Module: deepgym.async_core

from deepgym import AsyncDeepGym

Constructor

AsyncDeepGym(
    api_key: str | None = None,
    api_url: str | None = None,
    default_timeout: int = 30,
    mode: Literal['auto', 'daytona', 'local'] = 'auto',
)

Extra parameters over the sync client:

Parameter	Default	Description
`api_url`	`None`	Remote DeepGym API endpoint (HTTP mode)
`default_timeout`	`30`	Fallback timeout in seconds

Same methods as DeepGym, all async:

result = await dg.run(env, model_output=solution)
batch = await dg.run_batch(env, solutions, max_parallel=10)
eval_result = await dg.eval('medium', outputs, max_parallel=50)

Data Models

Module: deepgym.models

Environment

from deepgym import Environment

env = Environment(
    task='Write a function that...',
    type='coding',
    verifier_code='...',          # or verifier_path=Path('verifier.py')
    language='python',
    timeout=30,
    difficulty='medium',
    tags=['dp', 'array'],
    test_cases=[{'input': [1,2,3], 'expected': 6}],
    snapshot='python',            # Daytona image name
    env_vars={'DEBUG': '1'},
)

Field	Type	Default
`task`	`str`	required
`type`	`Literal['coding','computer-use','tool-use']`	`'coding'`
`verifier_code`	`str`	`''`
`verifier_path`	`Path \| None`	`None`
`language`	`str`	`'python'`
`timeout`	`int`	`30`
`difficulty`	`Literal['easy','medium','hard']`	`'medium'`
`tags`	`list[str]`	`[]`
`test_cases`	`list[dict] \| None`	`None`
`snapshot`	`str \| None`	`None`
`env_vars`	`dict[str, str] \| None`	`None`

Must provide either verifier_code or verifier_path.

RunResult

class RunResult(BaseModel):
    score: float          # 0.0 to 1.0
    passed: bool
    output: str           # verifier stdout
    stderr: str
    exit_code: int        # 0=pass, 1=fail, 2=error
    execution_time_ms: float
    sandbox_id: str
    reward_components: dict[str, float] | None
    metrics: dict[str, Any] | None
    seed: int | None
    truncated: bool       # True if timed out
    error_type: str | None
    cases: list[CaseResult] | None

CaseResult

class CaseResult(BaseModel):
    id: str
    passed: bool
    score: float          # 0.0 to 1.0
    input_summary: str
    expected_summary: str
    actual_summary: str
    error: str | None
    execution_time_ms: float

BatchResult

class BatchResult(BaseModel):
    results: list[RunResult]
    total: int
    passed: int
    failed: int
    avg_score: float
    execution_time_ms: float

EvalResult

class EvalResult(BaseModel):
    suite: str
    model_name: str
    pass_rate: float
    results: list[RunResult]
    total: int
    passed: int
    avg_score: float

Exceptions

Module: deepgym.exceptions

DeepGymError (base)
  |-- VerifierError      # verifier crashed or bad JSON
  |-- SandboxError       # sandbox creation/execution failure
  |-- TimeoutError       # exceeded env.timeout

Helpers

from deepgym import load_environment, list_environments, load_suite

env = load_environment('coin_change')        # load by name
envs = list_environments()                    # list all available
easy = load_suite('easy')                     # load a suite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core API Reference

Core API Reference

DeepGym (sync client)

Constructor

Methods

`run(env, model_output) -> RunResult`

`run_batch(env, outputs, max_parallel) -> BatchResult`

`eval(suite, model_outputs, max_parallel) -> EvalResult`

AsyncDeepGym (async client)

Constructor

Data Models

Environment

RunResult

CaseResult

BatchResult

EvalResult

Exceptions

Helpers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally