Replies: 1 comment
-
|
Here is another example I got today If it writes a test and knows how to run that specific test, I would expect it to verify the test result at least once before yielding control. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This is the result of me telling it to add a test to the code base (Bazel.git, completely open-source).
As you can see, it wrote the test but does not run the test itself, despite AGENTS.md having examples showing how to run tests.
It does not even run after I told it the explicit command that failed.
This is on
gpt-5.2-codexwith high model_reasoning_effort. Something is just off about the eagerness (or lack thereof) to validate the tests.Beta Was this translation helpful? Give feedback.
All reactions