You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Disable auto_eval_on_rewrite
* Use jinja2 to render system prompt
* Load template from config or template file. Add base_agent tests.
* Add jinja to requirements
* Render system prompt from agent + info data
* Only load system prompt template from jinja2 file, not plain text.
Raises FileNotFoundError if system_prompt_template_file not found
* removed unused imports
* Refactor system prompt template to use Jinja2 for rendering and add custom JSON filter
* Add comments to optionally load a custom system prompt template in configuration files
* Disable auto_eval_on_rewrite for swe-smith
* minor
* Build default system prompt from dict. Add trim back
* fix test
* Simplify env instruction for better rendering
* removed BASE_SYSTEM_PROMPT_TEMPLATE
* Update readme with instructions to use jinja
* Agent filter trims from middle by default
* Add human friendly system prompt template
* Add jinja templates to MANIFEST.in
* merge agents tests
---------
Co-authored-by: Xingdi (Eric) Yuan <xingdi-eric-yuan@users.noreply.github.com>
Copy file name to clipboardExpand all lines: README.md
+67-6Lines changed: 67 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,28 +137,89 @@ We provide a human mode that enables developers to manually interact with `debug
137
137
138
138
#### 3.3. Overriding Values in Config
139
139
140
-
`-p` is a handy way to override values defined in config. For example, the below command will run rewrite_agent agent on Aider with human mode (while in config file it specifies gpt-4o).
140
+
The `-p`flag is a handy way to override values defined in the config file. For example, the command below will run the rewrite_agent agent on Aider with human mode (even if the config file specifies gpt-4o). The command also overrides the default system prompt (see below for more information).
#### 3.4. Customizing the System Prompt with Jinja Templates
150
+
151
+
`debug-gym` allows you to fully customize the system prompt by providing a [Jinja](https://jinja.palletsprojects.com/) template file. This enables you to control the format and content of the prompt sent to the LLM, making it easier to adapt the environment to your specific needs or research experiments.
152
+
153
+
To use a custom system prompt template, specify the path to your Jinja template file in your agent's configuration under `system_prompt_template_file`. For example:
Alternatively, you can provide a custom template from the command line with `-p <agent>.system_prompt_template_file="<path/to/template.jinja>"` (see above).
161
+
162
+
Within your Jinja template, you have access to the `agent` and `info` objects, which provide all relevant context about the current environment and agent state.
163
+
164
+
#### Custom Jinja Filters
165
+
166
+
In addition to all [built-in Jinja filters](https://jinja.palletsprojects.com/en/stable/templates/#list-of-builtin-filters), two custom filters are available for use in your template:
167
+
168
+
- **`to_pretty_json`**: Converts a Python object to a pretty-printed JSON string. Useful for displaying structured data in a readable format.
169
+
```jinja
170
+
{{ info.tools | to_pretty_json }}
171
+
```
172
+
173
+
- **`trim_message`**: Trims a string to fit within a token or character limit, also filtering out non-UTF8 characters. This is helpful for ensuring that large outputs (such as directory trees or evaluation results) do not exceed the LLM's context window. The `trim_message` filter accepts the following arguments to control how messages are trimmed:
174
+
- **`max_length`**: The maximum number of tokens to keep in the message. If the message exceeds this length, it will be trimmed.
175
+
- **`max_length_percentage`**: Instead of specifying an absolute number, you can provide a percentage (e.g., `0.1` for 10%) of the LLM's context window. The message will be trimmed to fit within this percentage of the model's maximum context length.
176
+
- **`where`**: Specifies where to trim the message if it exceeds the limit. The default is `"middle"`, which trims from the middle of the message. Other options are `start` or `end`.
Modify `scripts/config.yaml`, especially the `env_kwargs` to set the path and entrypoint of the custom repository. We assume there is a `.debugignore` file and a `.debugreadonly` within the repository that labels files/folders that are not seen or not editable, respectively.
147
208
148
209
As an example, we provide a buggy pytorch code repository in `data/pytorch`.
[SWE-Smith](https://github.com/SWE-bench/SWE-smith) allows to generate new buggy code instances. Give a custom HuggingFace dataset (either local or remote) that has a similar structure as [SWE-bench/SWE-smith](https://huggingface.co/datasets/SWE-bench/SWE-smith), one can override the `-p base.env_kwargs.dataset_id=<dataset_id>` in the command line to run the agent on that dataset. For example, to run on a local dataset:
`debug-gym`'s modular design makes it extensible. Users are encouraged to extend `debug-gym` to their specific usecases, for example by creating new tools that diversify an agent's action and observation spaces. For detailed instruction on designing new tools that are `debug-gym`-compatible, please refer to the [Technical Report](https://arxiv.org/abs/2503.21557).
160
221
161
-
#### 3.7. Analysis and Visualization
222
+
#### 3.8. Analysis and Visualization
162
223
163
224
We provide a set of scripts to help analyze the log files (e.g., the `.jsonl` files) generated by the agent.
164
225
- In the `analysis` folder, we provide scripts that used to generate the corresponding figures in our technical report.
"After successful rewrites, the environment will automatically call the Eval tool to evaluate the rewritten code. Therefore, you do not need to call the Eval tool yourself. The evaluation output will be updated automatically in the system prompt."
116
-
)
117
-
ifself.config.get("env_kwargs", {}).get(
118
-
"persistent_breakpoints"
119
-
) isTrueandself.env.has_tool("pdb"):
120
-
shortcut_features.append(
121
-
"The environment will automatically restore existing breakpoints when a new PDB session is started (e.g., after a rewrite)."
122
-
)
123
-
ifself.config.get("env_kwargs", {}).get(
124
-
"auto_list"
125
-
) isTrueandself.env.has_tool("pdb"):
126
-
shortcut_features.append(
127
-
"After every valid PDB tool calling, the environment will automatically call the PDB tool again with a `list .` command, which will show the code around the current frame."
0 commit comments