Running predefined unit tests in the SWE-agent docker container #834

dgjun32 · 2024-11-02T07:21:40Z

Describe the issue

Hello, thank you so much for the nice work.

For each task instance in the SWE-bench-lite, there are corresponding unit tests (PASS_TO_PASS and FAIL_TO_PASS).
I am trying to run the unit tests in the docker container corresponding to certain task instance like:

# load SWE-Bench Lite dataset
dataset = datasets.load_dataset("...")

# define environment argument
args = EnvironmentArguments(dataset)

# initilaize SWEEnv
env = SWEEnv(args)

# reset task instance 100
obs, info = env.reset(index=100)

# get a pre-defined unit test corresponding to 100th task instance
pass_to_pass_test = dataset[100]['PASS_TO_PASS'][0] # (e.g., "tests/migrations/test_writer.py")
fail_to_pass_test = dataset[100]['FAIL_TO_PASS'][0]

# run the pre-defined unit test in the docker
obs_1, reward, done, info = env.step(f"pytest {pass_to_pass_test}")
obs_2, reward, done, info = env.step(f"pytest {fail_to_pass_test}")

What I expected was: obs_1 does not include any execution error and obs_2 includes error message.
However the issues are:

1. depending on the task instance, unit testing PASS_TO_PASS unit tests result in error (especially in task related to django library)
1. It seems like FAIL_TO_PASS unit tests are not present in the docker container for the task instance.

I think issue 2 is natural but issue 1 is strange, as PASS_TO_PASS unit tests have to be run successfully.

Optional: Relevant documentation page

No response

The text was updated successfully, but these errors were encountered:

klieret · 2024-11-04T15:51:40Z

Yes, 2 is natural indeed, because those are mostly the issues that were added in the gold solution PR. Hmm i) shouldn't happen ideally. Could it be that the environment is not set up properly? Is it an error (i.e., import error etc.) or a failed unit test (i.e., failed assert statement etc.)? Do you observe issues with running swe-bench gold validation on the instances from i) ? If yes, please open a bug report over at https://github.com/princeton-nlp/SWE-bench.

But also swe-agent at the moment has diverged a little bit from swe-bench. swe-bench now builds new docker images for every instance, whereas swe-agent is starting from a base-image and then pip/conda installing things on top, which is a bit more brittle. This will be different in swe-agent 1.0.0 where we will get rid of all of the setup stuff and simply use docker images from swe-bench. Hoping to get this out this week or the next

dgjun32 changed the title ~~Running predefined unit tests in the SWE-agent docker image~~ Running predefined unit tests in the SWE-agent docker container Nov 2, 2024

klieret added the ❔question Further information is requested label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running predefined unit tests in the SWE-agent docker container #834

Running predefined unit tests in the SWE-agent docker container #834

dgjun32 commented Nov 2, 2024 •

edited

Loading

klieret commented Nov 4, 2024 •

edited

Loading

Running predefined unit tests in the SWE-agent docker container #834

Running predefined unit tests in the SWE-agent docker container #834

Comments

dgjun32 commented Nov 2, 2024 • edited Loading

Describe the issue

Optional: Relevant documentation page

klieret commented Nov 4, 2024 • edited Loading

dgjun32 commented Nov 2, 2024 •

edited

Loading

klieret commented Nov 4, 2024 •

edited

Loading