Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training losses of the models #1

Open
borgr opened this issue Mar 29, 2024 · 6 comments
Open

Training losses of the models #1

borgr opened this issue Mar 29, 2024 · 6 comments

Comments

@borgr
Copy link

borgr commented Mar 29, 2024

Hi,
Can you share the logs or training losses of the different models (per arch isoflop size etc.)? I will be sure to cite ;-)

@borgr
Copy link
Author

borgr commented Apr 24, 2024

Please?

@Zymrael
Copy link
Collaborator

Zymrael commented May 2, 2024

Hi @borgr, I added a sample here, with losses per isoFLOP group, with indication of model size. The sample includes Transformer++ (Llama) and SH with ~8.3% striping.

Hopefully it is useful :) What type of analysis are you planning? If you let me know I can try to collect the relevant data

@borgr
Copy link
Author

borgr commented May 2, 2024

I am collecting a meta-dataset with losses\downstream-eval for different architectures pretraining schemas data etc. To allow to ask questions about pretraining without doing a lot of pretraining, to allow for results on understanding scaling laws or a\c testings themselves (this is quite mature, we have results on things like how to make a scaling law more efficiently what affects results how good are the predictions etc.), and to allow other questions (we just started another effort related to economics... so quite diverse). Data is power, and architectures are often less diverse than what you have :-)
The minimum useful is loss\eval throughout the pretraining (per family big plus) plus model size and metadata helpful (arch data or what exists)

@Zymrael
Copy link
Collaborator

Zymrael commented May 2, 2024

Great, then it sounds like the sample above should be good, take a look and let me know!

@borgr
Copy link
Author

borgr commented May 2, 2024

Looks great! If you have anything else I would be happy if you sent it my way (just because you said "sample", means you might have more hyennas and stuff, right? In the paper you had quite a few more, AB testings are a good thing and quite rare, and you had a few of those. as they allow future people to see if they can predict one is better than the other)
If the size is annoying, You can cap the maximum steps per model to something smaller if that's an issue (e.g., 10K?)
If useful for anyone to parse:


    numpy_dict = np.load("raw_data/loss_size_flops_llama_sh.npy", allow_pickle=True)
    rows = []
    for model, mod_dict in numpy_dict.tolist().items():
        for flops, isoflop_dict in mod_dict.items():
            for model_data in zip(*isoflop_dict.values()):
                metadata = {key.lower(): val for key, val in zip(isoflop_dict.keys(), model_data)}
                for loss, cur_flops in zip(metadata["loss"], np.linspace(0, float(flops), len(metadata["loss"]))):
                    row = metadata.copy()
                    row["loss"] = loss
                    row["scaled_set"] = model
                    row["flops"] = cur_flops
                    rows.append(row)
    df = pd.DataFrame.from_records(rows)

@borgr
Copy link
Author

borgr commented May 8, 2024

Also, is it possible that the number of tokens seen is ~10e9-max 1e11?
Model_params_max-flops tokens_seen
Striped Hyena 1-11_0.10B_20.00E 3.207023e+10
Striped Hyena 1-11_0.10B_40.00E 6.414046e+10
Striped Hyena 1-11_0.16B_80.00E 8.338310e+10
Striped Hyena 1-11_0.17B_20.00E 1.913711e+10
Striped Hyena 1-11_0.17B_40.00E 3.827422e+10
Striped Hyena 1-11_0.22B_20.00E 1.501130e+10
Striped Hyena 1-11_0.22B_200.00E 1.502729e+11
Striped Hyena 1-11_0.29B_20.00E 1.154081e+10
Striped Hyena 1-11_0.29B_40.00E 2.308162e+10
Striped Hyena 1-11_0.29B_80.00E 4.616324e+10
Striped Hyena 1-11_0.36B_20.00E 9.219595e+09
Striped Hyena 1-11_0.36B_200.00E 9.227016e+10
Striped Hyena 1-11_0.36B_40.00E 1.843919e+10
Striped Hyena 1-11_0.44B_20.00E 7.539229e+09
Striped Hyena 1-11_0.54B_40.00E 1.243781e+10
Striped Hyena 1-11_0.55B_80.00E 2.405991e+10
Striped Hyena 1-11_0.65B_200.00E 5.109001e+10
Striped Hyena 1-11_0.65B_40.00E 1.021174e+10
Striped Hyena 1-11_0.76B_80.00E 1.755822e+10
Striped Hyena 1-11_1.03B_80.00E 1.295666e+10
Striped Hyena 1-11_1.16B_20.00E 2.869705e+09
Striped Hyena 1-11_1.16B_200.00E 2.871053e+10
Striped Hyena 1-11_1.16B_40.00E 5.739410e+09
Striped Hyena 1-11_1.90B_200.00E 1.751262e+10
Striped Hyena 1-11_1.90B_40.00E 3.501019e+09
Striped Hyena 1-11_1.90B_80.00E 7.002038e+09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants