Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95
+91
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two bugs in the RLM scaffold that cause incorrect behavior during execution. These bugs are identified in a companion paper analyzing RLM's code generation architecture.
What Was Broken
Bug 1: Premature Task Completion (Delimiter Capture)
The Problem: RLMs signal completion by generating text like
FINAL(answer). But this pattern appears in the same text as the model's reasoning. The regex that detects completion was too greedy—it would match from the firstFINAL(through the last)in the entire response, capturing unrelated code and explanations.Real Impact:
FINAL()andFINAL_VAR()The Fix:
(.*)to non-greedy(.*?)to stop at the first)finish()andfinish_var()as Python functions (preferred method)FINAL()regex for trained modelsBug 2: Scaffold Function Corruption (Namespace Collision)
The Problem: All code executions share one Python namespace. If the model writes
context = "something"orllm_query = lambda x: "hijacked", it silently overwrites the critical functions that make RLM work.Real Impact:
context) can be destroyed mid-executionllm_query) can be replaced, breaking recursionThe Fix:
SCAFFOLD_NAMESlist of protected function namesllm_query,context,FINAL_VAR,finish, etc.Test Results
✅ All existing tests pass: 135 passed, 8 skipped
Improvements on evaluation harness:
The 8 skipped tests are optional dependencies (Gemini API key, litellm, modal, prime_sandboxes).
Code Changes
~55 lines of code total:
rlm/utils/parsing.py: ~30 LOC for better FINAL() detectionrlm/environments/local_repl.py: ~25 LOC for scaffold name protectionBoth fixes are runtime safeguards that work with the existing Python REPL. Full backward compatibility maintained.
Background (Optional Reading)
These bugs are analyzed in a forthcoming paper "Scope Hygiene in Recursive Language Models" which frames the RLM architecture as a metaprogramming system. The paper identifies four scope failures; this PR fixes the two engineering bugs:
The paper also identifies two architectural issues requiring deeper changes (referential opacity, cross-context breakage) addressed by its proposed Scheme coordination layer.
Key References:
🤖 Generated with Claude Code