-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: merge & merge conflict #3
Conversation
Could you merge these two files into one? It looks like only the setup() differs and everything else is the same. To make the merge test more interesting you could also consider making the files slightly longer, with only part of it different, just to make sure it's not just wholesale replacing the old file with the other, and it's actually merging. For an example of how other tests do this you can see, e.g., https://github.com/carlini/yet-another-applied-llm-benchmark/blob/main/tests/webgl_triangle.py |
Ty for the suggestions! Combined the two files into one and updated the merge setup |
One small suggestion: I believe it's beneficial to showcase the results of a test execution to ensure everything is functioning correctly. This not only adds credibility but also serves as a safeguard against potential issues introduced by new PRs. |
tests/git_merge_conflict.py
Outdated
|
||
return success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return success |
dead code.
Makes sense! This is the test execution for this one using gpt-4-0125-preview Merge TestECHOING: Initialized empty Git repository in /private/tmp/fakedocker_832869/.git/
commit 7e99c8222028e5dabe9bf9cbaacd7330c9d4124f
commit fa40a9d458c6f9b7e5e73bd4e5cb16515edd9bc0
[main 7eeffff] Multiplication
commit fa40a9d458c6f9b7e5e73bd4e5cb16515edd9bc0
Switched to branch 'other' gpt-4-1106-preview GETCACHE 'INPUT: gpt-4-1106-preview GETCACHE 'INPUT: Docker SetupI have setup the docker container to run the model evaluation. Initial Query
Looping until doneIteration 0LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 1LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
PyFunc
Resulting in output:
Merge Conflict TestECHOING: Initialized empty Git repository in /private/tmp/fakedocker_768587/.git/
commit 70f0a96cfebc3f20dfe8e960e6f3acd72b05d7d1
[main dcf0ffd] subtraction
commit 70f0a96cfebc3f20dfe8e960e6f3acd72b05d7d1
Switched to branch 'other' gpt-4-1106-preview GETCACHE 'INPUT: gpt-4-1106-preview GETCACHE 'INPUT: gpt-4-1106-preview GETCACHE 'INPUT: Unmerged paths: Untracked files: no changes added to commit (use "git add" and/or "git commit -a") gpt-4-1106-preview GETCACHE 'INPUT: gpt-4-1106-preview CACHE MISS ["\nYou are in a git repository. I have two branches, main and some other one I forgot the name of. Merge the other branch into main.\n\nOn each round you should just give me just one command to type, I'll run it exactly as-is, and give you the exact output. I will not make any changes. So do not write any placholder text. I'll get started:\n\nINPUT: gpt-4-1106-preview CACHE MISS ["\nYou are in a git repository. I have two branches, main and some other one I forgot the name of. Merge the other branch into main.\n\nOn each round you should just give me just one command to type, I'll run it exactly as-is, and give you the exact output. I will not make any changes. So do not write any placholder text. I'll get started:\n\nINPUT: nothing added to commit but untracked files present (use "git add" to track) gpt-4-1106-preview CACHE MISS ["\nYou are in a git repository. I have two branches, main and some other one I forgot the name of. Merge the other branch into main.\n\nOn each round you should just give me just one command to type, I'll run it exactly as-is, and give you the exact output. I will not make any changes. So do not write any placholder text. I'll get started:\n\nINPUT: gpt-4-1106-preview CACHE MISS ["\nYou are in a git repository. I have two branches, main and some other one I forgot the name of. Merge the other branch into main.\n\nOn each round you should just give me just one command to type, I'll run it exactly as-is, and give you the exact output. I will not make any changes. So do not write any placholder text. I'll get started:\n\nINPUT: no changes added to commit (use "git add" and/or "git commit -a") Docker SetupI have setup the docker container to run the model evaluation. Initial Query
Looping until doneIteration 0LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 1LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 2LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 3LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 4LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 5LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 6LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 7LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 8LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
Iteration 9LLM GenerationQuery
Output
PyFuncResulting in output:
Run Code InterpreterRunning the following program:
And got the output:
PyFuncResulting in output:
PyFunc
Resulting in output:
|
Co-authored-by: Atarust <jonas.kapitzke@gmail.com>
Added two git tests for a simple merge and a merge conflict basically just forked from the cherry-pick test
Found myself really wanting better evals when working on an automatic merge conflict resolver, so was excited to see this and play around with it!
Also go bears 🐻