-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ype' option was removed
…ype' option was removed (#1309) * fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed * config adjustments for llama and gated activations * pre-commit --------- Co-authored-by: jahatef <hatef.4@buckeyemail.osu.edu> Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Hi @tiandeyu-cs |
It would be great if you can share what's the formula to use to get the intermediate size for the config since it is not 3x in the above case but more like 2.97x. This will help me make my other config files for smaller models. |
The code linked here calculates 1/3 of the Therefore, if we set |
Thanks @tiandeyu-cs |
p.s. it would be nice to have this as a comment in the yaml file for future wanderers or just maybe making it 11008*3 would save some people from confusion |
After the 'mlp_type' option was removed, the regular-type and the llama-type of MLP share the same implementation, and the "mlp_type" is now specified by whether the activation is gated or not.
However, this changes the meaning of the 'intermediate_size' option in llama configuration files. The code (megatron/model/transformer.py) now treats 'intermediate_size' as the size of the output tensor of the first linear layer in the MLP, which includes the first layer and the gated layer of llama-type MLP. This means that the code actually halves the 'intermediate_size' of llama-type MLP. Meanwhile, the code multiplies the 'intermediate_size' by (2/3), which means the actual 'intermediate_size' is only (1/3) of the intended size in the configuration file.
To fix this problem, I revised the llama configuation files, and set the 'intermediate_size' to 3 times as its intended value.