Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running predefined unit tests in the SWE-agent docker container #834

Open
dgjun32 opened this issue Nov 2, 2024 · 1 comment
Open

Running predefined unit tests in the SWE-agent docker container #834

dgjun32 opened this issue Nov 2, 2024 · 1 comment
Labels
❔question Further information is requested

Comments

@dgjun32
Copy link

dgjun32 commented Nov 2, 2024

Describe the issue

Hello, thank you so much for the nice work.

For each task instance in the SWE-bench-lite, there are corresponding unit tests (PASS_TO_PASS and FAIL_TO_PASS).
I am trying to run the unit tests in the docker container corresponding to certain task instance like:

# load SWE-Bench Lite dataset
dataset = datasets.load_dataset("...")

# define environment argument
args = EnvironmentArguments(dataset)

# initilaize SWEEnv
env = SWEEnv(args)

# reset task instance 100
obs, info = env.reset(index=100)

# get a pre-defined unit test corresponding to 100th task instance
pass_to_pass_test = dataset[100]['PASS_TO_PASS'][0] # (e.g., "tests/migrations/test_writer.py")
fail_to_pass_test = dataset[100]['FAIL_TO_PASS'][0]

# run the pre-defined unit test in the docker
obs_1, reward, done, info = env.step(f"pytest {pass_to_pass_test}")
obs_2, reward, done, info = env.step(f"pytest {fail_to_pass_test}")

What I expected was: obs_1 does not include any execution error and obs_2 includes error message.
However the issues are:

    1. depending on the task instance, unit testing PASS_TO_PASS unit tests result in error (especially in task related to django library)
    1. It seems like FAIL_TO_PASS unit tests are not present in the docker container for the task instance.

I think issue 2 is natural but issue 1 is strange, as PASS_TO_PASS unit tests have to be run successfully.

Optional: Relevant documentation page

No response

@dgjun32 dgjun32 changed the title Running predefined unit tests in the SWE-agent docker image Running predefined unit tests in the SWE-agent docker container Nov 2, 2024
@klieret
Copy link
Member

klieret commented Nov 4, 2024

Yes, 2 is natural indeed, because those are mostly the issues that were added in the gold solution PR. Hmm i) shouldn't happen ideally. Could it be that the environment is not set up properly? Is it an error (i.e., import error etc.) or a failed unit test (i.e., failed assert statement etc.)? Do you observe issues with running swe-bench gold validation on the instances from i) ? If yes, please open a bug report over at https://github.com/princeton-nlp/SWE-bench.

But also swe-agent at the moment has diverged a little bit from swe-bench. swe-bench now builds new docker images for every instance, whereas swe-agent is starting from a base-image and then pip/conda installing things on top, which is a bit more brittle. This will be different in swe-agent 1.0.0 where we will get rid of all of the setup stuff and simply use docker images from swe-bench. Hoping to get this out this week or the next

@klieret klieret added the ❔question Further information is requested label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants