Fix elastic autoscaling: use base model for health checks by JannikSt · Pull Request #1745 · PrimeIntellect-ai/prime-rl

JannikSt · 2026-02-09T02:06:26Z

Summary

Fix elastic autoscaling not working for newly scaled-up inference servers
Health check was using model_name which gets updated to LoRA name after training starts
New servers only have base model, so they'd fail health checks forever

The bug

After training starts, update_model_name() changes model_name to the LoRA adapter name (e.g., rft-xxx). New inference servers only have the base model loaded. The health check would reject them, so they'd never get added to the pool and never receive the LoRA adapter.

The fix

Store original model name as base_model_name at init time and use it for health checks.

Note

Low Risk
Low risk change limited to elastic pool health checks; main risk is unintentionally accepting servers that lack the current LoRA model, which is mitigated by the separate adapter sync logic.

Overview
Fixes elastic inference autoscaling by decoupling server health checks from the mutable model_name.

ElasticInferencePool now stores the initial model_name as base_model_name and uses it when validating /v1/models in _check_server_health, so newly scaled servers that only have the base model are considered healthy and can be added to the pool for subsequent LoRA adapter syncing.

^{Written by Cursor Bugbot for commit 7d566a9. This will update automatically on new commits. Configure here.}

The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them.

* start env servers for env group * bump * working reverse-text * correctly set log levels * use client configs everywhere + make val work * cycle through clients via inference pool * update model name * deprecate env worker * implement evals * deprecate evals+synthesize * deprecate serialization * use all clients for evals * simplify config * fix types and use inference pool for opd * bring back logging intercept * setup env client/server in prime-rl * externalize running env server * style * bring back rate limiter on scheduler * revert vf branch * back to custom branch * do not double asyncio * bring back eval * fix cpu tests * bump vf * add math group config * remove stop server call * add logs * remove last mentions of vf.State * deprecate some configs * more * remove eval + cpu integration tests * remove evals + synthesize configs * bump vf * fix branch with vlm cache * do not reference rollout status * use correct model name * stop teacher infer pool if setup * update changelog * deprecate server discovery (unused) * fix env id stripping * use updated model name for evals * strip env version on env server * remove server discovery tests * bump vf * do not fail if env server not yet up * add elastic sanity check * update docs * use extra env kwargs consistently across orch and env server * update math group config * update cfg * use dynamic model name in final evals * add extra_env_kwargs to changelog * resolve vf merge conflicts * assert lora name not None * fix unit tests * fix types * add hendrycks math sanity check * bump vf * do not double repeat eval inputs * do not duplicate eval inputs * lower avg@ * Initialize logger in env-server before install_env (#1743) install_env() calls get_logger() which requires the logger to be set up first. This was missing in env-server but present in orchestrator. * disable vf logging on orch * Fix elastic autoscaling: use base model for health checks (#1745) The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them. --------- Co-authored-by: JannikSt <JannikSt@users.noreply.github.com> Co-authored-by: William Brown <williambrown97@gmail.com>

samsja approved these changes Feb 9, 2026

View reviewed changes

JannikSt merged commit 02a29d3 into env-worker Feb 9, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix elastic autoscaling: use base model for health checks#1745

Fix elastic autoscaling: use base model for health checks#1745
JannikSt merged 1 commit intoenv-workerfrom
fix/elastic-autoscaling-env-worker

JannikSt commented Feb 9, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

JannikSt commented Feb 9, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The bug

The fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JannikSt commented Feb 9, 2026 •

edited by cursor bot

Loading