-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training losses of the models #1
Comments
Please? |
I am collecting a meta-dataset with losses\downstream-eval for different architectures pretraining schemas data etc. To allow to ask questions about pretraining without doing a lot of pretraining, to allow for results on understanding scaling laws or a\c testings themselves (this is quite mature, we have results on things like how to make a scaling law more efficiently what affects results how good are the predictions etc.), and to allow other questions (we just started another effort related to economics... so quite diverse). Data is power, and architectures are often less diverse than what you have :-) |
Great, then it sounds like the sample above should be good, take a look and let me know! |
Looks great! If you have anything else I would be happy if you sent it my way (just because you said "sample", means you might have more hyennas and stuff, right? In the paper you had quite a few more, AB testings are a good thing and quite rare, and you had a few of those. as they allow future people to see if they can predict one is better than the other)
|
Also, is it possible that the number of tokens seen is ~10e9-max 1e11? |
Hi,
Can you share the logs or training losses of the different models (per arch isoflop size etc.)? I will be sure to cite ;-)
The text was updated successfully, but these errors were encountered: