Fix elastic autoscaling: use base model for health checks#1745
Merged
JannikSt merged 1 commit intoenv-workerfrom Feb 9, 2026
Merged
Fix elastic autoscaling: use base model for health checks#1745JannikSt merged 1 commit intoenv-workerfrom
JannikSt merged 1 commit intoenv-workerfrom
Conversation
The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them.
samsja
approved these changes
Feb 9, 2026
samsja
pushed a commit
that referenced
this pull request
Feb 9, 2026
* start env servers for env group * bump * working reverse-text * correctly set log levels * use client configs everywhere + make val work * cycle through clients via inference pool * update model name * deprecate env worker * implement evals * deprecate evals+synthesize * deprecate serialization * use all clients for evals * simplify config * fix types and use inference pool for opd * bring back logging intercept * setup env client/server in prime-rl * externalize running env server * style * bring back rate limiter on scheduler * revert vf branch * back to custom branch * do not double asyncio * bring back eval * fix cpu tests * bump vf * add math group config * remove stop server call * add logs * remove last mentions of vf.State * deprecate some configs * more * remove eval + cpu integration tests * remove evals + synthesize configs * bump vf * fix branch with vlm cache * do not reference rollout status * use correct model name * stop teacher infer pool if setup * update changelog * deprecate server discovery (unused) * fix env id stripping * use updated model name for evals * strip env version on env server * remove server discovery tests * bump vf * do not fail if env server not yet up * add elastic sanity check * update docs * use extra env kwargs consistently across orch and env server * update math group config * update cfg * use dynamic model name in final evals * add extra_env_kwargs to changelog * resolve vf merge conflicts * assert lora name not None * fix unit tests * fix types * add hendrycks math sanity check * bump vf * do not double repeat eval inputs * do not duplicate eval inputs * lower avg@ * Initialize logger in env-server before install_env (#1743) install_env() calls get_logger() which requires the logger to be set up first. This was missing in env-server but present in orchestrator. * disable vf logging on orch * Fix elastic autoscaling: use base model for health checks (#1745) The health check was using `model_name` which gets updated to the LoRA adapter name after training starts. New inference servers only have the base model, so they would fail health checks and never get added to the pool. Store the original model name as `base_model_name` and use it for health checks, allowing new servers to be discovered and have the LoRA adapter loaded on them. --------- Co-authored-by: JannikSt <JannikSt@users.noreply.github.com> Co-authored-by: William Brown <williambrown97@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model_namewhich gets updated to LoRA name after training startsThe bug
After training starts,
update_model_name()changesmodel_nameto the LoRA adapter name (e.g.,rft-xxx). New inference servers only have the base model loaded. The health check would reject them, so they'd never get added to the pool and never receive the LoRA adapter.The fix
Store original model name as
base_model_nameat init time and use it for health checks.Note
Low Risk
Low risk change limited to elastic pool health checks; main risk is unintentionally accepting servers that lack the current LoRA model, which is mitigated by the separate adapter sync logic.
Overview
Fixes elastic inference autoscaling by decoupling server health checks from the mutable
model_name.ElasticInferencePoolnow stores the initialmodel_nameasbase_model_nameand uses it when validating/v1/modelsin_check_server_health, so newly scaled servers that only have the base model are considered healthy and can be added to the pool for subsequent LoRA adapter syncing.Written by Cursor Bugbot for commit 7d566a9. This will update automatically on new commits. Configure here.