You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each task instance in the SWE-bench-lite, there are corresponding unit tests (PASS_TO_PASS and FAIL_TO_PASS).
I am trying to run the unit tests in the docker container corresponding to certain task instance like:
# load SWE-Bench Lite dataset
dataset = datasets.load_dataset("...")
# define environment argument
args = EnvironmentArguments(dataset)
# initilaize SWEEnv
env = SWEEnv(args)
# reset task instance 100
obs, info = env.reset(index=100)
# get a pre-defined unit test corresponding to 100th task instance
pass_to_pass_test = dataset[100]['PASS_TO_PASS'][0] # (e.g., "tests/migrations/test_writer.py")
fail_to_pass_test = dataset[100]['FAIL_TO_PASS'][0]
# run the pre-defined unit test in the docker
obs_1, reward, done, info = env.step(f"pytest {pass_to_pass_test}")
obs_2, reward, done, info = env.step(f"pytest {fail_to_pass_test}")
What I expected was: obs_1 does not include any execution error and obs_2 includes error message.
However the issues are:
depending on the task instance, unit testing PASS_TO_PASS unit tests result in error (especially in task related to django library)
It seems like FAIL_TO_PASS unit tests are not present in the docker container for the task instance.
I think issue 2 is natural but issue 1 is strange, as PASS_TO_PASS unit tests have to be run successfully.
Optional: Relevant documentation page
No response
The text was updated successfully, but these errors were encountered:
dgjun32
changed the title
Running predefined unit tests in the SWE-agent docker image
Running predefined unit tests in the SWE-agent docker container
Nov 2, 2024
Yes, 2 is natural indeed, because those are mostly the issues that were added in the gold solution PR. Hmm i) shouldn't happen ideally. Could it be that the environment is not set up properly? Is it an error (i.e., import error etc.) or a failed unit test (i.e., failed assert statement etc.)? Do you observe issues with running swe-bench gold validation on the instances from i) ? If yes, please open a bug report over at https://github.com/princeton-nlp/SWE-bench.
But also swe-agent at the moment has diverged a little bit from swe-bench. swe-bench now builds new docker images for every instance, whereas swe-agent is starting from a base-image and then pip/conda installing things on top, which is a bit more brittle. This will be different in swe-agent 1.0.0 where we will get rid of all of the setup stuff and simply use docker images from swe-bench. Hoping to get this out this week or the next
Describe the issue
Hello, thank you so much for the nice work.
For each task instance in the SWE-bench-lite, there are corresponding unit tests (PASS_TO_PASS and FAIL_TO_PASS).
I am trying to run the unit tests in the docker container corresponding to certain task instance like:
What I expected was: obs_1 does not include any execution error and obs_2 includes error message.
However the issues are:
I think issue 2 is natural but issue 1 is strange, as PASS_TO_PASS unit tests have to be run successfully.
Optional: Relevant documentation page
No response
The text was updated successfully, but these errors were encountered: