Support a new model #1475

takgto · 2024-06-10T06:29:53Z

Do you have a plan to support JetMoE model (https://github.com/myshell-ai/JetMoE) that very effective to reduce computational cost in inference in litgpt?

rasbt · 2024-06-10T15:40:03Z

Hi there,
thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

rasbt · 2024-06-13T01:11:38Z

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

takgto · 2024-06-13T01:39:35Z

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me.
Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows.
weight_map = {
"model.embed_tokens.weight": "transformer.wte.weight",
"model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key
"model.layers.{}.mlp.router.layer.weight": ?,
"model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight",
"model.layers.{}.mlp.bias": ?,
"model.layers.{}.mlp.input_linear.weight": ?,
"model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight",
"model.layers.{}.self_attention.experts.bias": ? ,
"model.layers.{}.self_attention.experts.input_linear.weight": ? ,
"model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight",
"model.layers.{}.self_attention.kv_proj.weight": ? ,
"model.norm.weight": "transformer.ln_f.weight",
"model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight",
"model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight",
"model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight",
}
Do you know any tools or documentations to find out those unknown keys?

rasbt · 2024-06-13T19:56:25Z

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

litgpt/litgpt/scripts/convert_hf_checkpoint.py

Line 138 in e2f8074

if config.mlp_class_name == "LLaMAMoE":

rasbt · 2024-06-13T19:57:39Z

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

takgto · 2024-06-14T00:57:59Z

Thank you for your continued support.
According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe.
Separately, I am asking the jetmoe website to provide parameter mapping information ( myshell-ai/JetMoE#11 ). Unfortunately, I haven't received a reply yet.

rasbt · 2024-06-14T17:57:26Z

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this

rasbt added the model-weights label Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a new model #1475

Support a new model #1475

takgto commented Jun 10, 2024

rasbt commented Jun 10, 2024

rasbt commented Jun 13, 2024

takgto commented Jun 13, 2024

rasbt commented Jun 13, 2024

rasbt commented Jun 13, 2024 •

edited

Loading

takgto commented Jun 14, 2024

rasbt commented Jun 14, 2024

Support a new model #1475

Support a new model #1475

Comments

takgto commented Jun 10, 2024

rasbt commented Jun 10, 2024

rasbt commented Jun 13, 2024

takgto commented Jun 13, 2024

rasbt commented Jun 13, 2024

rasbt commented Jun 13, 2024 • edited Loading

takgto commented Jun 14, 2024

rasbt commented Jun 14, 2024

rasbt commented Jun 13, 2024 •

edited

Loading