Skip to content

Comments

created the rlm_secrets environment#763

Merged
snimu merged 6 commits intomainfrom
sebastian/rlm-secrets-env-2026-01-22
Jan 22, 2026
Merged

created the rlm_secrets environment#763
snimu merged 6 commits intomainfrom
sebastian/rlm-secrets-env-2026-01-22

Conversation

@snimu
Copy link
Contributor

@snimu snimu commented Jan 22, 2026

Description

Secrets: basic idea of the eval is that the model must call functions in a specific order and pass information between them in order to solve a puzzle. This is designed in a way to force the model to use sub-LLMs, and those sub-LLMs to use tools, and the secrets can also be in files, so that file access is tested.

  • Setup:
    • Theres a bunch of files with random names all containing random UUIDs as content
    • The prompt tells the RLM that it must first find the correct order of the filenames (which are are also randomized), and then delete all files but the correct one and answer with the position of the right file
  • Available functions:
    • To root-LLM:
      • decrypt_position(file_name: str, code: str) -> int | str
        • returns the position if the code was right
        • returns an error message if the code was wrong
        • internally, just checks if the code is the correct random string for the file_name (and that file_name is one of the files) or not and returns the corresponding output
      • unveil_file_number(sorted_filenames: list[str]) -> int | str
        • if the filenames are passed in the correct order, it returns the position of the file that should remain
        • otherwise, it returns an error message
    • To sub-LLMs:
      • get_code_from_file_data(filename: str, filecontent: str) -> str
        • If filename exists and filecontent is its actual content, the function returns another random code which can be used by the root-LLM to decrypt_position with the filename and the content
        • Otherwise, it returns a random code generated on the fly that looks just like a correct code would, and thus still forces the root-LLM to call decrypt_position and check
  • Example rollout:
    • root-LLM uses bash to list the available files
    • it calls a sub-llm with the filename and content (which it had to read via bash) and prompts it to give it the corresponding code
    • the sub-LLM calls get_code_from_filesystem and returns the code
    • the root-LLM calls decrypt_position
    • if there's an error the root-LLM will try again, otherwise the position is clear
    • this is repeated with all the files
    • the root-LLM now know the relative position of all files, so it calls unveil_file_number with the correctly ordered filenames; and either it has to retry everything, or it gets a valid number
    • the root-LLM deletes all files but the correct one and answers with the file number in its final response
  • Which parts of the RLM are touched:
    • Tools on all levels
    • Sub-LLM calls
    • File operations: listing, reading, deleting
  • And yet it's a very simple environment and codex can probably one-shot the implementation, too

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Adds a new RLM evaluation environment focused on multi-turn tool use and file operations.

  • New environments/rlm_secrets/rlm_secrets.py: defines RLMSecretsEnv with root tools (decrypt_position, unveil_file_number), sub-LLM tool (get_code_from_file_data), filesystem setup of random .txt files, dataset builder, and reward functions (correct_answer, correct_filesystem_state).
  • Implements sub-LLM invocation via llm_batch with state injection for sub-tools and retains rollout filesystem for verification/cleanup.
  • New environments/rlm_secrets/README.md: docs for puzzle flow, tools, usage (uv run vf-eval rlm-secrets), config, and rewards.
  • New environments/rlm_secrets/pyproject.toml: package metadata, verifiers>=0.1.8 dependency, build config, and eval settings.

Written by Cursor Bugbot for commit 5295d0a. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@snimu snimu merged commit 177baaf into main Jan 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant