-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very low pass@1 #13
Comments
Hi @marianna13, thanks for reporting the issue! Could you check if you had the same issues as the one mentioned in #8 (comment)? No one has reported the same issue yet, and I doubt if it is due to the broken Docker images. The ground truth pass rate suggests you had 0%, which should not happen for the correct setup. |
Hey Terry, thanks for quick response! |
Could you check the eval_results.json that should be generated via the docker container? I'd like to see the detailed failures for some tasks to see what happened. If no other issues were raised, it could be the other issues inside the environment. |
I uploaded eval_results.json for this run here. Thanks! |
Oh, it seems that your input file doesn't have any proper generations. The |
No, there's no error, I only see outputs like
for all 1140 tasks.
But that's it, no other errors or warnings. |
Ah wait, I also found this error message (it was in the other log file):
|
BTW, I forgot that you used |
Not sure what happened on the CUDA side 🤔 Could you check if you can successfully generate without the docker? |
BTW, I double-checked this log. It appears to be a warning. I mainly believe that the ~0 pass rate is due to the trailing newlines. |
No, unfortunately with |
And also without docker? 👀 |
Oh wait, I noticed that not all of them have the empty completion. My bad. If you strip the newlines, I guess the pass rate should be higher..? Maybe I'm wrong. The granite base model may require additional newlines instead of no trailing newlines. You can also check generations to see if you can get similar results. |
without docker I get some flash attn error, I guess something is wrong with my env, I will try with clean conda env |
I tried again with more mem + -strip_newlines and it seem to work! I got 19.9 for |
Issue
Hey everyone,
I was trying to eval some models on the BigCodeBench but I get very low pass@1 (which is way lower than what's been reported for this model) and this warning:
For reproduction
I tried
granite-3b-code-base
in this setup but for other models that I tried (stablelm-1..6b, granite-8b-code-base it was the same).For both apptainer images I used docker images mentioned in this repo, both latest versions.
My cmd for evaluation:
My generation cmd:
Please let me know if it's an issue on my side or what I can do to solve it! Thanks in advance!
The text was updated successfully, but these errors were encountered: