Support for token-in vLLM endpoint#626
Conversation
snimu
left a comment
There was a problem hiding this comment.
Looks really great to me :)
|
@mikasenghaas Another thought -- I'm not sure how much sense it makes to have the tokenizer pool managed at the verifiers layer, seems spiritually similar to inference DP which is auto-managed by vLLM / prime-rl... ideally there is always only a single tokenizer endpoint available, and if replication is needed to manage load, this can be behind the endpoint |
verifiers/utils/eval_utils.py
Outdated
| if config.extra_env_kwargs: | ||
| logger.info(f"Setting extra environment kwargs: {config.extra_env_kwargs}") | ||
| for k, v in config.extra_env_kwargs.items(): | ||
| setattr(vf_env, k, v) |
There was a problem hiding this comment.
Bug: EnvGroup sub-environments miss interleaved_rollouts propagation
Using setattr to set extra_env_kwargs bypasses the set_interleaved_rollouts method in EnvGroup. When an EnvGroup is loaded and interleaved_rollouts is set via extra_env_kwargs, only the group's attribute is updated, but sub-environments remain with interleaved_rollouts=False. Since EnvGroup.rollout() delegates to sub-environments, and each sub-environment's get_model_response checks its own self.interleaved_rollouts, the token-in feature silently won't work for EnvGroup environments.
Additional Locations (1)
verifiers/utils/token_utils.py
Outdated
| @lru_cache(maxsize=None) | ||
| def get_tokens_client(client: AsyncOpenAI) -> AsyncOpenAI: | ||
| logger.debug("Lazily copying OpenAI client for requests to /tokenize API") | ||
| url_without_v1 = str(client.base_url).replace("/v1/", "") |
There was a problem hiding this comment.
Bug: URL manipulation fails without trailing slash
The replace("/v1/", "") operation only works when the base URL includes a trailing slash after /v1. If a user configures their vLLM server with base_url="http://localhost:8000/v1" (no trailing slash), the replacement doesn't match and the URL remains unchanged. The tokenize request would then be sent to /v1/tokenize instead of /tokenize, causing the request to fail with a confusing 404 or routing error.
Description
This PR implements integrates the custom token-in
/v1/chat/completions/tokensendpoint from PRIME-RL's inference server (introdued in #1422) with verifiers so that PRIME-RL can do multi-turn RL without mismatches caused by retokenization.The main changes are:
interleaved_rollouts(and any other extra env kwargs) configurable viavf-evalget_model_responsewill correctly set up prompt tokens, sampling args and the client to make a request to the custom endpointWe decided on the following defaults for reliably building prompt tokens:
--api-server-countto not get bottlenecked by tokenization)env_responsein isolation and compute suffix tokens (tokens added in between messages by chat template, but not produced by LLM) once on dummy messages and cache for later usage. This should be safe in 99.9% of the cases.Examples
Default behavior is unaffected, e.g. running
math-pythonagainst OAI APITo use the token-in prompt, start a custom vLLM server from PRIME-RL
uv run vf-eval math-python -n1 -r1 -b http://localhost:8000/v1 -m Qwen/Qwen3-4B-Instruct-2507 -v -x '{"interleaved_rollouts": true}'Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Adds interleaved rollouts using a vLLM token-in endpoint with pre-tokenized prompts, and introduces CLI-configurable extra environment kwargs applied at runtime.
get_model_responseusing custom/v1/chat/completions/tokenswith pre-tokenizedprompt_idsand normalized sampling args.set_kwargs,set_interleaved_rollouts(with warning).verifiers/utils/token_utils.pywithtokenize_vllm,get_prompt_ids, andprepare_sampling_args_for_token_prompts(cached suffix handling, overlap logic, tokens client copy).--extra-env-kwargstovf-eval; plumb throughEvalConfig.extra_env_kwargsand apply viavf_env.set_kwargsinrun_evaluation.set_interleaved_rolloutsto propagate to sub-envs.State.clientandState.modelrequired (non-optional).tests/test_eval_cli.pyto includeextra_env_kwargsarg and validate sampling args precedence.Written by Cursor Bugbot for commit 75fa695. This will update automatically on new commits. Configure here.