-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic benchmarking of gpt-engineer with swe-bench #913
Comments
Tempted to prioritize this higher after the Devin announcement (just as @batwood001 in #1062). |
Makes sense. Let's figure it out this Thursday at our tech planning meeting and the availability of people. |
@viborc can you assign this to me? |
Done! |
This is more of a general update to the community than anything else. The work on this issue is ongoing, and @Mohit-Dhawan98 is working on it with @ATheorell's support. We'll likely have SWE bench support in the near future! |
Someone from the OpenDevin suggested we might look into their work here and possibly learn from it and re-use if needed. Putting this here for our reference: https://github.com/OpenDevin/OpenDevin/tree/main/evaluation/swe_bench |
Feature description
We have a way to easily add benchmarks:
https://www.loom.com/share/206805143fbb4302b5455a5329eaab17?sid=f689608f-8e49-44f7-b55f-4c81e9dc93e6
This issue is about looking into if swe-bench is a good benchmark to add and then add a simple version of it.
The text was updated successfully, but these errors were encountered: