don't dropout in eval mode #240

YassineYousfi · 2023-04-11T06:05:14Z

To give the model more chances to figure out if it's being trained or deployed :) (c.f. https://twitter.com/karpathy/status/1635049541534879745)

karpathy · 2023-04-13T05:06:08Z

Whoa hold on this is an actual bug in my implemtnation right? Basically we are applying dropout at evaluation time?

YassineYousfi · 2023-04-13T05:34:29Z

Yes. I only noticed it recently while training some models with dropout.

karpathy · 2023-04-13T05:43:49Z

😱 😱 😱
:'(
... shook. calling functional directly considered dangerous
ty

prshnthrv · 2023-04-13T14:44:32Z

I see that if I use different dropout values and run eval using the gpt2 model, I get different results. Any idea why?
For example,
dropout = 0 and I run
python train.py config/eval_gpt2.py
I get a val_loss of 3.09, which is close to what you get.
If I set dropout = 0.1 and run eval again, I get a val_loss of 3.49
With dropout = 0.2, val_loss is 4.23.
Any idea why? I stepped through and indeed the model is in eval() mode and dropout should not have any impact during evaluation, correct? What am I missing?

YassineYousfi · 2023-04-13T16:51:23Z

You mean after this PR was merged?
There should not be any difference in the first call of estimate_loss() after this PR was merged.

prshnthrv · 2023-04-13T17:16:53Z

I think before this PR. This PR only changes if we are using the flash attention module. In my setup, I am not using flash attention. So this fix should not have any impact on my evaluation.

YassineYousfi · 2023-04-13T17:25:22Z

hmmm, I don't think I can reproduce. You should open an issue with your setup + script to reproduce

prshnthrv · 2023-04-13T17:29:56Z

If you run
$ python train.py config/eval_gpt2.py
but change the dropout value in https://github.com/karpathy/nanoGPT/blob/master/train.py#L55 to 0.1 or 0.2, the val_loss is different as I mentioned earlier.
I can open another issue.

YassineYousfi · 2023-04-13T17:33:17Z

Yes I just did that on current master (with the shakespeare dataset, with and without flash attention) and the losses were the same.

prshnthrv · 2023-04-13T18:01:40Z

Interesting. I’m using openwebtext. I can try with Shakespeare as well. But was trying to match the owt numbers specified.

Gptconfig fix

dont always dropout!

7399dfe

karpathy merged commit 01e48ec into karpathy:master Apr 13, 2023

YassineYousfi mentioned this pull request Oct 3, 2023

scaled_dot_product_attention: don't dropout in eval huggingface/pytorch-image-models#1978

Merged

klei22 pushed a commit to gkielian/ReaLLMASIC_nanogpt that referenced this pull request Aug 24, 2024

Merge pull request karpathy#240 from djlisbonne/gptconfig_fix

eb8be6a

Gptconfig fix

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this pull request Sep 5, 2024

Merge pull request karpathy#240 from djlisbonne/gptconfig_fix

673068b

Gptconfig fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't dropout in eval mode #240

don't dropout in eval mode #240

YassineYousfi commented Apr 11, 2023

karpathy commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

karpathy commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023 •

edited

Loading

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

don't dropout in eval mode #240

don't dropout in eval mode #240

Conversation

YassineYousfi commented Apr 11, 2023

karpathy commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

karpathy commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023 • edited Loading

YassineYousfi commented Apr 13, 2023

prshnthrv commented Apr 13, 2023

prshnthrv commented Apr 13, 2023 •

edited

Loading