Skip to content

No eval shortcut #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jun 13, 2025
Merged

No eval shortcut #152

merged 22 commits into from
Jun 13, 2025

Conversation

matheper
Copy link
Collaborator

@matheper matheper commented Jun 11, 2025

This pull request introduces several enhancements to the debug-gym framework, focusing on customizable system prompts, improved agent behavior, and codebase simplifications. The changes include adding support for Jinja templates in system prompts, introducing new utility methods for agents, and simplifying the instructions property in various environments.

Customizable System Prompts

  • Added support for Jinja templates to build system prompts, allowing users to customize the format and content of prompts. Custom filters such as to_pretty_json and trim_message were introduced to enhance template functionality. ([[1]](https://github.com/microsoft/debug-gym/pull/152/files#diff-769a128278122d779780df5a3a6a2d1ffa96e793d619dec71d091bf21368824eL75-R188), [[2]](https://github.com/microsoft/debug-gym/pull/152/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L140-R222), [[3]](https://github.com/microsoft/debug-gym/pull/152/files#diff-34ea40c6da6f40f445b3eafff912f32342156241356dd99a856f22ace2d24a06R36-R37), [[4]](https://github.com/microsoft/debug-gym/pull/152/files#diff-9ebc73ddba89145d2c3c0d2a7a35c2fc03a4306bc810bc01c3edac0e4dcd96a7R35-R36), [[5]](https://github.com/microsoft/debug-gym/pull/152/files#diff-8b1170dceb4870f28ae9f00425abdbd14207448da847fcfdc58087e399ff5cceR35-R36))
  • Updated the documentation in README.md to explain how to use Jinja templates for system prompts, including examples and details about custom filters. ([README.mdL140-R222](https://github.com/microsoft/debug-gym/pull/152/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L140-R222))

Agent Enhancements

  • Refactored the build_system_prompt method to use either a Jinja template or a default JSON-based format. Added utility methods like shortcut_features and _auto_eval_on_rewrite for better modularity. ([debug_gym/agents/base_agent.pyL75-R188](https://github.com/microsoft/debug-gym/pull/152/files#diff-769a128278122d779780df5a3a6a2d1ffa96e793d619dec71d091bf21368824eL75-R188))
  • Introduced the _load_system_prompt_template method to handle loading and validating custom templates. ([debug_gym/agents/base_agent.pyL75-R188](https://github.com/microsoft/debug-gym/pull/152/files#diff-769a128278122d779780df5a3a6a2d1ffa96e793d619dec71d091bf21368824eL75-R188))

Environment Simplifications

  • Simplified the instructions property across multiple environments (AiderBenchmarkEnv, MiniNightmareEnv, SWEBenchEnv) to return a plain string instead of a dictionary. ([[1]](https://github.com/microsoft/debug-gym/pull/152/files#diff-0fd30e53ec10b12a89fd527c14c2d8c898af0b24e565ac866877c8c6e322e0f5L15-R16), [[2]](https://github.com/microsoft/debug-gym/pull/152/files#diff-0c9839d137c9c2cedc011109ff42d57f42e819ec862b943a5df9da45f86782f8L25-R26), [[3]](https://github.com/microsoft/debug-gym/pull/152/files#diff-14c661bab81e165fba55704ca88ccbf72e3a1a1f5c528ff9857527106d3f34ddL54-R55))
  • Updated the base environment to return an empty string for instructions by default. ([debug_gym/gym/envs/env.pyL275-R276](https://github.com/microsoft/debug-gym/pull/152/files#diff-dd500acaa499f131456c6cd77f9310117f349f19f869fc0fcb4279f71025f108L275-R276))

Configuration Updates

  • Added an optional system_prompt_template_file field in configuration files to specify custom Jinja templates. ([[1]](https://github.com/microsoft/debug-gym/pull/152/files#diff-34ea40c6da6f40f445b3eafff912f32342156241356dd99a856f22ace2d24a06R36-R37), [[2]](https://github.com/microsoft/debug-gym/pull/152/files#diff-9ebc73ddba89145d2c3c0d2a7a35c2fc03a4306bc810bc01c3edac0e4dcd96a7R35-R36), [[3]](https://github.com/microsoft/debug-gym/pull/152/files#diff-8b1170dceb4870f28ae9f00425abdbd14207448da847fcfdc58087e399ff5cceR35-R36))
  • Changed the default value of auto_eval_on_rewrite to False in multiple configuration files to give users more control over evaluation behavior. ([[1]](https://github.com/microsoft/debug-gym/pull/152/files#diff-34ea40c6da6f40f445b3eafff912f32342156241356dd99a856f22ace2d24a06L11-R11), [[2]](https://github.com/microsoft/debug-gym/pull/152/files#diff-9ebc73ddba89145d2c3c0d2a7a35c2fc03a4306bc810bc01c3edac0e4dcd96a7L11-R11), [[3]](https://github.com/microsoft/debug-gym/pull/152/files#diff-8b1170dceb4870f28ae9f00425abdbd14207448da847fcfdc58087e399ff5cceL11-R11))

Codebase Cleanup

  • Removed unused imports and redundant code in various files to improve maintainability. ([[1]](https://github.com/microsoft/debug-gym/pull/152/files#diff-872c89cb5a43ebc1e36269426eb653c84080edcbb0134bcc946b3a788cf09b47L2), [[2]](https://github.com/microsoft/debug-gym/pull/152/files#diff-7f81de762adcd06df31fbac17be81b87332a901ff68f36e232e12d25b754ab92L1-L5))

@matheper matheper marked this pull request as ready for review June 12, 2025 13:18
@matheper matheper changed the title WIP - No eval shortcut No eval shortcut Jun 12, 2025
Copy link
Collaborator

@MarcCote MarcCote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I love the new template for human mode thanks.

@MarcCote MarcCote merged commit 951f7a1 into main Jun 13, 2025
6 checks passed
@MarcCote MarcCote deleted the no-eval-shortcut branch June 13, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants