Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95

rwtaber · 2026-02-09T17:51:44Z

Summary

Fixes two bugs in the RLM scaffold that cause incorrect behavior during execution. These bugs are identified in a companion paper analyzing RLM's code generation architecture.

What Was Broken

Bug 1: Premature Task Completion (Delimiter Capture)

The Problem: RLMs signal completion by generating text like FINAL(answer). But this pattern appears in the same text as the model's reasoning. The regex that detects completion was too greedy—it would match from the first FINAL( through the last ) in the entire response, capturing unrelated code and explanations.

Real Impact:

Zhang et al. (2026) report 30% of training examples had confusion between FINAL() and FINAL_VAR()
False positives when the model writes plans like "I will call FINAL() after..."
Captures text through unrelated closing parentheses in code blocks

The Fix:

Changed regex from greedy (.*) to non-greedy (.*?) to stop at the first )
Added finish() and finish_var() as Python functions (preferred method)
Maintains backward compatibility with existing FINAL() regex for trained models

Bug 2: Scaffold Function Corruption (Namespace Collision)

The Problem: All code executions share one Python namespace. If the model writes context = "something" or llm_query = lambda x: "hijacked", it silently overwrites the critical functions that make RLM work.

Real Impact:

Zhang et al. (2026, Example E.2) show trajectories where variables get accidentally overwritten
User prompt (context) can be destroyed mid-execution
Sub-LLM query function (llm_query) can be replaced, breaking recursion

The Fix:

Added SCAFFOLD_NAMES list of protected function names
After each code execution, automatically restore critical functions if overwritten
Prevents silent corruption of llm_query, context, FINAL_VAR, finish, etc.

Test Results

✅ All existing tests pass: 135 passed, 8 skipped

Improvements on evaluation harness:

Delimiter capture: 14/21 → 18/21 (+4 tests passing)
Namespace collision: 11/15 → 13/15 (+2 tests passing)
Cross-context: 4/11 → 5/11 (+1 test passing)
Total: +7 tests passing, 0 regressions

The 8 skipped tests are optional dependencies (Gemini API key, litellm, modal, prime_sandboxes).

Code Changes

~55 lines of code total:

rlm/utils/parsing.py: ~30 LOC for better FINAL() detection
rlm/environments/local_repl.py: ~25 LOC for scaffold name protection

Both fixes are runtime safeguards that work with the existing Python REPL. Full backward compatibility maintained.

Background (Optional Reading)

These bugs are analyzed in a forthcoming paper "Scope Hygiene in Recursive Language Models" which frames the RLM architecture as a metaprogramming system. The paper identifies four scope failures; this PR fixes the two engineering bugs:

Delimiter capture (§3.1): Control signals in the same string space as content—like reserved keywords colliding with variable names
Namespace collision (§3.2): Dynamic scoping allows accidental overwrites—a problem solved in programming languages 40+ years ago (Scheme, 1978)

The paper also identifies two architectural issues requiring deeper changes (referential opacity, cross-context breakage) addressed by its proposed Scheme coordination layer.

Key References:

Zhang, A. L., Kraska, T., & Khattab, O. (2026). Recursive Language Models. arXiv:2512.24601
Companion paper on scope hygiene (forthcoming)

🤖 Generated with Claude Code

This commit addresses two scope hygiene bugs identified in the paper "Scope Hygiene in Recursive Language Models": 1. Delimiter capture (§3.1): - Fixed greedy regex matching in FINAL() parsing that could capture through the last ')' in the entire response - Changed from greedy (.*) to non-greedy (.*?) matching - Added programmatic finish() and finish_var() functions as preferred completion mechanism with backward-compatible regex fallback - Functions set _finish_answer flag checked before regex patterns 2. Namespace collision (§3.2): - Added SCAFFOLD_NAMES frozenset to protect critical bindings - Added post-exec() restoration of scaffold bindings if overwritten - Prevents user code from silently corrupting llm_query, context, FINAL_VAR, and other scaffold functions - Stored original bindings in _scaffold_bindings dict All existing tests pass (135 passed, 8 skipped). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95

Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95

rwtaber commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95

Are you sure you want to change the base?

Fix delimiter capture and namespace collision bugs (§3.1-3.2) #95

Conversation

rwtaber commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Was Broken

Bug 1: Premature Task Completion (Delimiter Capture)

Bug 2: Scaffold Function Corruption (Namespace Collision)

Test Results

Code Changes

Background (Optional Reading)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rwtaber commented Feb 9, 2026 •

edited

Loading