add elastic endpoint pool for dynamic GPU scavengin by hallerite · Pull Request #957 · PrimeIntellect-ai/verifiers

hallerite · 2026-02-24T18:59:27Z

Description

Adds opt-in elastic = true mode: a background task polls endpoints.toml and updates the dispatcher's live endpoint list mid-run
Retries on preempted servers re-acquire from the dispatcher instead of retrying the same dead endpoint
New endpoints are picked up, removed endpoints are drained, in-flight concurrency counts are preserved

Changes

LeastLoadedDispatcher.update_variants() — swaps variant list under the condition lock, keyed by api_base_url
ElasticEndpointPool (new) — asyncio background task that calls load_endpoints() and pushes updated slots to the dispatcher
Dispatched retries moved outside acquire() so preempted-server failures re-acquire a slot on a live endpoint
EvalConfig gains elastic, elastic_poll_interval, endpoints_path fields
Pool lifecycle wired into run_evaluations() / run_evaluations_tui()

Example Elastic Eval Config

elastic = true
elastic_poll_interval = 10
endpoints_path = "endpoints.toml"

[[eval]]
env_id = "primeintellect/math-env"
endpoint_id = "zai-org/GLM-4.7-FP8"

An external sidecar manages endpoints.toml – adding/removing [[endpoint]] entries as GPU servers come and go. The eval job adapts automatically.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches core evaluation concurrency and request routing, and adds background hot-reload behavior; misconfiguration or edge cases could change throughput or cause unexpected blocking, though changes are guarded and covered by new tests.

Overview
Adds per-variant concurrency limits to endpoint registry variants via optional max_concurrent, switching multi-variant dispatch from round-robin to least-loaded routing and enforcing an all-or-nothing configuration rule across variants.

Introduces an opt-in elastic endpoint pool (elastic=true) that polls endpoints.toml during a run and updates the live variant set while preserving in-flight capacity; evaluation wiring now builds shared LeastLoadedDispatcher instances per endpoint_id, disables the global --max-concurrent semaphore when dispatchers are active, and ensures retries re-acquire capacity so failures on removed/preempted endpoints can move to healthy replicas.

Updates config/types/docs to support max_concurrent on endpoints and elastic/elastic_poll_interval/endpoints_path on eval configs, and adds tests covering dispatcher acquisition/release semantics, dynamic variant updates, and elastic pool reload behavior.

^{Written by Cursor Bugbot for commit fbf26b2. This will update automatically on new commits. Configure here.}

…-concurrency

…-or-nothing concurrency

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T19:11:36Z

docs/reference.md

 ```

-Leaf endpoint configuration used inside `ClientConfig.endpoint_configs`. Has the same fields as `ClientConfig` except `endpoint_configs` itself, preventing recursive nesting.
+Leaf endpoint configuration used inside `ClientConfig.endpoint_configs`. Has the same fields as `ClientConfig` except `endpoint_configs` itself, preventing recursive nesting. The optional `max_concurrent` field limits how many concurrent requests this variant handles; see [Per-Variant Concurrency](evaluation.md#concurrency).


EvalConfig docs missing new elastic pool fields

Low Severity

The EvalConfig section in docs/reference.md is missing the three new fields added by this PR: elastic, elastic_poll_interval, and endpoints_path. These are user-facing configuration options for the elastic endpoint pool feature. The documentation rule requires updating reference docs when core user-facing functionality is modified.

Additional Locations (1)

verifiers/types.py#L493-L497

^{Triggered by project rule: BugBot Instructions}

cursor · 2026-02-24T19:11:36Z

docs/evaluation.md


 By default, scoring runs interleaved with generation. Use `--no-interleave-scoring` to score all rollouts after generation completes.

+When per-variant `max_concurrent` limits are configured in the endpoint registry, the endpoint dispatcher manages concurrency globally across all variants and the `--max-concurrent` flag is ignored.


Elastic mode feature undocumented in evaluation docs and skills

Low Severity

The PR adds a new user-facing elastic endpoint pool feature (with elastic, elastic_poll_interval, and endpoints_path config fields), but neither docs/evaluation.md nor skills/evaluate-environments/SKILL.md documents the elastic mode itself. Only per-variant max_concurrent is documented. Users have no documentation for how to enable or configure elastic polling.

Additional Locations (1)

skills/evaluate-environments/SKILL.md#L45-L61

^{Triggered by project rule: BugBot Instructions}

hallerite added 10 commits February 11, 2026 03:07

add per-variant endpoint concurrency with least-loaded dispatch

51b05c7

Merge remote-tracking branch 'origin/main' into hallerite/per-variant…

a39017a

…-concurrency

run ruff

2338043

update docs

58dcaa5

clean up dispatch: rename, remove NullEndpointDispatcher, enforce all…

05e7f0b

…-or-nothing concurrency

fix ruff

06c9ec0

fix leak & update docs

042aea7

fix group too big for endpoint

47e8607

update skill & fix small issue

a95ddd5

elastic endpoint pool: re-read endpoints.toml mid-run

fbf26b2

cursor bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

add elastic endpoint pool for dynamic GPU scavengin#957

add elastic endpoint pool for dynamic GPU scavengin#957
hallerite wants to merge 10 commits intomainfrom
hallerite/elastic

hallerite commented Feb 24, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		By default, scoring runs interleaved with generation. Use `--no-interleave-scoring` to score all rollouts after generation completes.

		When per-variant `max_concurrent` limits are configured in the endpoint registry, the endpoint dispatcher manages concurrency globally across all variants and the `--max-concurrent` flag is ignored.

Comments

Conversation

hallerite commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Example Elastic Eval Config

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

EvalConfig docs missing new elastic pool fields

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Elastic mode feature undocumented in evaluation docs and skills

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented Feb 24, 2026 •

edited by cursor bot

Loading