Conversation
This reverts commit a1052f8.
|
hmmm would it be possible to have this handled by overriding add_trajectory_step from MultiTurnEnv? maybe we still want to log those steps, but they shouldn't be the in the main sequence in state['trajectory'] if they won't be used for training IMO this feels like some of the RLM logic is creeping a bit too low into the stack (base Environment shouldn't know what an RLM is or have to think about it), and preparing the context for the API call should probably be handled by get_prompt_messages. If the RLM environment promises that get_prompt_messages only ever contains "increasing" sequences of messages on the subset of turns where a step will be added to state['trajectory'], then the old/current approach should work without changes I think? In general though, I'm a bit skeptical about trying to shoehorn RLMs into the interleaved strategy. Ultimately we want a single "best of both worlds" strategy which is always TITO, always "just works" with get_prompt_messages, and aggressively interleaves until a message sequence forces a branch |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Description
RLMEnv(root cause of/tokenizeempty‑messages errors). Sub‑LLM calls now go throughEnvironment.get_model_responseusing a fake state with an empty trajectory and mirroredsampling_args/oai_tools, so prompt‑id computation can’t be polluted. The old/chat/completions/tokenspath and explicit tokenization are gone for sub‑LLMsinclude_sub_llm_in_trajectorydefault isFalse, and interleaving is explicitly disallowed when it’sTrue(guards inset_interleaved_rolloutsandsetup_state)llm_batchis now strings‑only (enforced + documented). Non‑string prompts return an error messageget_model_responseexample_id/task, remove eval dataset creation, README updated with eval guidanceType of Change
Testing
uv run pytestlocally.Checklist
Note
Resolves sub-LLM rollout collisions and simplifies request paths; tightens llm_batch usage; and streamlines the rlm_secrets environment.
get_model_response(chat path) with a fake state, ignoring interleaving; removed/chat/completions/tokensand explicit tokenization. Sampling args are mirrored; message/sampling normalization moved to module-level helpers.include_sub_llm_in_trajectorydefaults toFalse. Interleaved rollouts are explicitly disallowed when it’sTrue(guards inset_interleaved_rolloutsandsetup_state). Sub-LLM steps can be added to the trajectory withextras.is_sub_llm_call.llm_batchaccepts a list of strings only; docs and tool help updated accordingly.example_idandtask; removed eval split creation;load_environmentno longer buildseval_dataset. README adds guidance to re-seed for eval.Written by Cursor Bugbot for commit 631b24a. This will update automatically on new commits. Configure here.