-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't dropout in eval mode #240
Conversation
Whoa hold on this is an actual bug in my implemtnation right? Basically we are applying dropout at evaluation time? |
Yes. I only noticed it recently while training some models with dropout. |
😱 😱 😱 |
I see that if I use different dropout values and run eval using the gpt2 model, I get different results. Any idea why? |
You mean after this PR was merged? |
I think before this PR. This PR only changes if we are using the flash attention module. In my setup, I am not using flash attention. So this fix should not have any impact on my evaluation. |
hmmm, I don't think I can reproduce. You should open an issue with your setup + script to reproduce |
If you run |
Yes I just did that on current master (with the shakespeare dataset, with and without flash attention) and the losses were the same. |
Interesting. I’m using openwebtext. I can try with Shakespeare as well. But was trying to match the owt numbers specified. |
Gptconfig fix
Gptconfig fix
To give the model more chances to figure out if it's being trained or deployed :) (c.f. https://twitter.com/karpathy/status/1635049541534879745)