Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't dropout in eval mode #240

Merged
merged 1 commit into from
Apr 13, 2023
Merged

Conversation

YassineYousfi
Copy link
Contributor

To give the model more chances to figure out if it's being trained or deployed :) (c.f. https://twitter.com/karpathy/status/1635049541534879745)

@karpathy
Copy link
Owner

Whoa hold on this is an actual bug in my implemtnation right? Basically we are applying dropout at evaluation time?

@YassineYousfi
Copy link
Contributor Author

Yes. I only noticed it recently while training some models with dropout.

@karpathy
Copy link
Owner

😱 😱 😱
:'(
... shook. calling functional directly considered dangerous
ty

@karpathy karpathy merged commit 01e48ec into karpathy:master Apr 13, 2023
@prshnthrv
Copy link

I see that if I use different dropout values and run eval using the gpt2 model, I get different results. Any idea why?
For example,
dropout = 0 and I run
python train.py config/eval_gpt2.py
I get a val_loss of 3.09, which is close to what you get.
If I set dropout = 0.1 and run eval again, I get a val_loss of 3.49
With dropout = 0.2, val_loss is 4.23.
Any idea why? I stepped through and indeed the model is in eval() mode and dropout should not have any impact during evaluation, correct? What am I missing?

@YassineYousfi
Copy link
Contributor Author

You mean after this PR was merged?
There should not be any difference in the first call of estimate_loss() after this PR was merged.

@prshnthrv
Copy link

I think before this PR. This PR only changes if we are using the flash attention module. In my setup, I am not using flash attention. So this fix should not have any impact on my evaluation.

@YassineYousfi
Copy link
Contributor Author

hmmm, I don't think I can reproduce. You should open an issue with your setup + script to reproduce

@prshnthrv
Copy link

prshnthrv commented Apr 13, 2023

If you run
$ python train.py config/eval_gpt2.py
but change the dropout value in https://github.com/karpathy/nanoGPT/blob/master/train.py#L55 to 0.1 or 0.2, the val_loss is different as I mentioned earlier.
I can open another issue.

@YassineYousfi
Copy link
Contributor Author

Yes I just did that on current master (with the shakespeare dataset, with and without flash attention) and the losses were the same.

@prshnthrv
Copy link

Interesting. I’m using openwebtext. I can try with Shakespeare as well. But was trying to match the owt numbers specified.

klei22 pushed a commit to gkielian/ReaLLMASIC_nanogpt that referenced this pull request Aug 24, 2024
gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this pull request Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants