Replies: 1 comment 13 replies
-
Hi there, and thanks for sharing this. Like you, I am a bit surprised that the model collapses while the loss seems to improve. I have unfortunately no good explanation for this phenomenon.
|
Beta Was this translation helpful? Give feedback.
13 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Bug description
Thanks for this excellent tutorial, learned a lot from this repo.
I followed the chapter 5 's 03_bonus_pretraining_on_gutenberg with fully gutenberg's data.
and the model performed well at the first 70 thousand steps. The word sequence that appends to Every effort moves seems reasonable and readable.
but just now I login to this server which has a single L40s GPU. the model's loss is much lower but the text sequence is weird.
Ep 1 (Step 140900): Train loss 3.190, Val loss 3.301 Ep 1 (Step 141000): Train loss 2.814, Val loss 3.307 Every effort moves you, and I will not be able to help you. You are not going to be troubled with the idea of a new life. You are not going to be troubled with the idea of a new life. You are Ep 1 (Step 141100): Train loss 2.836, Val loss 3.298 Ep 1 (Step 141200): Train loss 3.174, Val loss 3.303 Ep 1 (Step 141300): Train loss 2.953, Val loss 3.305 Ep 1 (Step 141400): Train loss 3.290, Val loss 3.294 Ep 1 (Step 141500): Train loss 2.784, Val loss 3.306 Ep 1 (Step 141600): Train loss 2.707, Val loss 3.316 Ep 1 (Step 141700): Train loss 3.126, Val loss 3.293 Ep 1 (Step 141800): Train loss 2.819, Val loss 3.317 Ep 1 (Step 141900): Train loss 2.922, Val loss 3.302 Ep 1 (Step 142000): Train loss 2.770, Val loss 3.311 .... Ep 1 (Step 303600): Train loss 1.942, Val loss 1.533 Ep 1 (Step 303700): Train loss 1.991, Val loss 1.545 Ep 1 (Step 303800): Train loss 2.034, Val loss 1.540 Ep 1 (Step 303900): Train loss 1.960, Val loss 1.539 Ep 1 (Step 304000): Train loss 1.966, Val loss 1.539 Every effort moves you 髫 1 髫 1 髫 1 髫 1 髫 1 髫 1 髫 1 髫 1 � Ep 1 (Step 304100): Train loss 1.872, Val loss 1.533 Ep 1 (Step 304200): Train loss 2.053, Val loss 1.535 Ep 1 (Step 304300): Train loss 1.974, Val loss 1.536 Ep 1 (Step 304400): Train loss 1.944, Val loss 1.544 Ep 1 (Step 304500): Train loss 1.923, Val loss 1.539 Ep 1 (Step 304600): Train loss 1.891, Val loss 1.551 Ep 1 (Step 304700): Train loss 1.998, Val loss 1.545 Ep 1 (Step 304800): Train loss 1.892, Val loss 1.544 Ep 1 (Step 304900): Train loss 1.888, Val loss 1.543 Ep 1 (Step 305000): Train loss 2.020, Val loss 1.537 Every effort moves you 1 susceptible 1 susceptible 1 susceptible 1 susceptible 1 susceptible 1 susceptible 1 susceptible Ep 1 (Step 305100): Train loss 1.906, Val loss 1.537 Ep 1 (Step 305200): Train loss 1.842, Val loss 1.542 Ep 1 (Step 305300): Train loss 2.080, Val loss 1.539 Ep 1 (Step 305400): Train loss 1.993, Val loss 1.536 Ep 1 (Step 305500): Train loss 2.016, Val loss 1.537 Ep 1 (Step 305600): Train loss 2.001, Val loss 1.533 Ep 1 (Step 305700): Train loss 1.844, Val loss 1.536 Ep 1 (Step 305800): Train loss 1.988, Val loss 1.533 Ep 1 (Step 305900): Train loss 1.590, Val loss 1.536 Ep 1 (Step 306000): Train loss 1.879, Val loss 1.536 Every effort moves you héré 1 dépouillé 1 dépouillé 1 dépouillé 1 dépouillé 1 dépouillé
What operating system are you using?
Linux
Where do you run your code?
Other cloud environment (AWS, Azure, GCP)
Environment
Beta Was this translation helpful? Give feedback.
All reactions