universal-ckp: support megatron-deepspeed llama model #4666

mosheisland · 2023-11-10T15:34:14Z

Megatron-DeepSpeed's llama implementation of swiglu allocates a single ColumnParallelLinear layer L, but effectively this parameter is a container of two Linear layers L1, L2 used for silu(L1(x)) * L2(x)). This requires special handling in ds_to_universal to create a representation of L parameter where the slices of L1 and L2 are first concatenated and then L is created by concatenating L1 and L2.

Megatron-DeepSpeed's llama implementation of swiglu allocates a single ColumnParallelLinear layer L, but effectively this parameter is a container of two Linear layers L1, L2 used for silu(L1(x)) * L2(x)). This requires special handling in ds_to_universal to create a representation of L parameter where the slices of L1 and L2 are first concatenated and then L is created by concatenating L1 and L2. Signed-off-by: Moshe Island <misland@habana.ai>

Megatron-DeepSpeed's llama implementation of swiglu allocates a single ColumnParallelLinear layer L, but effectively this parameter is a container of two Linear layers L1, L2 used for silu(L1(x)) * L2(x)). This requires special handling in ds_to_universal to create a representation of L parameter where the slices of L1 and L2 are first concatenated and then L is created by concatenating L1 and L2. Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

mosheisland requested a review from tjruwase as a code owner November 10, 2023 15:34

mosheisland mentioned this pull request Nov 10, 2023

universal-ckp: support llama model microsoft/Megatron-DeepSpeed#287

Merged

Merge branch 'master' into universal_ckp_llama

2a754e8

tjruwase approved these changes Nov 14, 2023

View reviewed changes

Merge branch 'master' into universal_ckp_llama

af3b795

tjruwase added this pull request to the merge queue Nov 14, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 14, 2023

Merge branch 'master' into universal_ckp_llama

5442d0c

loadams added this pull request to the merge queue Nov 15, 2023

Merged via the queue into microsoft:master with commit ce5e56a Nov 15, 2023
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

universal-ckp: support megatron-deepspeed llama model #4666

universal-ckp: support megatron-deepspeed llama model #4666

mosheisland commented Nov 10, 2023

universal-ckp: support megatron-deepspeed llama model #4666

universal-ckp: support megatron-deepspeed llama model #4666

Conversation

mosheisland commented Nov 10, 2023