[model] Add deepseek model. #274

marvin-Yu · 2024-03-21T13:07:08Z

No description provided.

changqi1 · 2024-03-28T01:28:00Z

src/xfastertransformer/tools/llama_convert.py

            config["llama"]["layernorm_eps"] = str(hf_config.get("rms_norm_eps", 1e-6))
            config["llama"]["layernorm_type"] = "pre_layernorm"
-            config["llama"]["activation_type"] = "silu"
+            config["llama"]["activation_type"] = str(hf_config["hidden_act"])


把LLaMa和deepseek的code分开来，llama的code尽量不要动，设计到deepseek的可以创建新的文件，但需要集成llama的code。

还是一样的问题, deepseek模型结构就是复用了llama的"LlamaForCausalLM", 建议还是复用llama的代码. 添加对llama 其它RoPE类型的补充.

changqi1 · 2024-03-28T01:30:35Z

src/layers/rotary_embedding.h


 private:
-    static bool initialized;
+    bool initialized = false;


需要static的，你这sin和cos会有多份相同的实例，但一个model只需要一份sin和cos

sin / cos 指向的内存空间只会在第一次初始化. 这部分的buffer 由 ctx 上下文的内存池来维护.
emb_cos = ctx->getBuffer(emb_cos_str, max_position_embeddings * inv_freq_size);
emb_sin = ctx->getBuffer(emb_sin_str, max_position_embeddings * inv_freq_size);

changqi1 · 2024-03-28T01:30:57Z

src/layers/rotary_embedding.h


 private:
-    static bool initialized;
+    bool initialized = false;


需要static的，你这sin和cos会有多份相同的实例，但一个model只需要一份sin和cos

changqi1 · 2024-03-28T01:31:47Z

src/layers/rotary_embedding.cpp


        for (size_t j = 0; j < inv_freq_size; j++) {
-            float tmp = i * inv_freq[j];
+            float tmp = i * inv_freq[j] / this->scaling_factor;


把LLaMa和deepseek的code分开来，llama的code尽量不要动，涉及到deepseek的可以创建新的文件，但需要集成llama的code。这个你可以新建一个deepseek的rope文件。

deepseek 类似 Yi 的方式, 直接复用的llama模型结构, LinearScaling rope的实现也是在llama model内支持的.
config.json: https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct/blob/main/config.json#L3
LinearScaling rope: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L148-L155

Done. This scaling_factor is LLaMa param.

changqi1 · 2024-03-28T01:35:39Z

benchmark/benchmark.py

    if "chatglm3" in args.model_name.lower():
        model_prompt = prompt_pool["chatglm3"]
-    if "llama" in args.model_name.lower():
+    if "llama" in args.model_name.lower() or "deepseek" in args.model_name.lower():


单独 if deepseek

deepseek在结构层面和llama一致, 建议复用llama的path.

changqi1 · 2024-03-28T03:46:38Z

src/layers/rotary_embedding.cpp


        for (size_t j = 0; j < inv_freq_size; j++) {
-            float tmp = i * inv_freq[j];
+            float tmp = i * inv_freq[j] / this->scaling_factor;


Done. This scaling_factor is LLaMa param.

changqi1 · 2024-03-28T05:29:41Z

README.md

    
    Supported model convert list:
    - LlamaConvert
+    - DeepseekConvert


put it down

changqi1 · 2024-03-28T05:30:22Z

benchmark/benchmark.py

    if "chatglm3" in args.model_name.lower():
        model_prompt = prompt_pool["chatglm3"]
-    if "llama" in args.model_name.lower():
+    if "llama" in args.model_name.lower() or "deepseek" in args.model_name.lower():


changqi1 · 2024-03-28T05:31:26Z

src/xfastertransformer/__init__.py

    "automodel": ["AutoModel"],
    "tools": [
        "LlamaConvert",
+        "DeepseekConvert",


put it down

marvin-Yu marked this pull request as ready for review March 21, 2024 13:07

marvin-Yu force-pushed the model/add_deepseek branch from 3b6cdca to 24f96d9 Compare March 21, 2024 13:14

marvin-Yu requested a review from pujiang2018 March 21, 2024 13:14

marvin-Yu force-pushed the model/add_deepseek branch from 24f96d9 to d5d5e6c Compare March 22, 2024 09:00

[model] Add deepseek model.

4bfe3d9

marvin-Yu force-pushed the model/add_deepseek branch from d5d5e6c to 4bfe3d9 Compare March 27, 2024 08:22

marvin-Yu requested review from Duyi-Wang and changqi1 March 27, 2024 08:27

changqi1 reviewed Mar 28, 2024

View reviewed changes

change convert tools.

4730f04

marvin-Yu force-pushed the model/add_deepseek branch from 3c30778 to 4730f04 Compare March 28, 2024 05:16

changqi1 approved these changes Mar 28, 2024

View reviewed changes

update

c62bc5e

marvin-Yu merged commit b29259a into main Mar 28, 2024

marvin-Yu deleted the model/add_deepseek branch March 29, 2024 08:37

[model] Add deepseek model. #274

[model] Add deepseek model. #274

Uh oh!

Conversation

marvin-Yu commented Mar 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants