FEAT: [model] support GLM-4.5 series (xorbitsai#3882)

qinxuye · zhcn000000 · commit 22cebd1aaba9 · 2025-08-04T20:31:52.000+08:00
diff --git a/doc/source/models/builtin/llm/codegeex4.rst b/doc/source/models/builtin/llm/codegeex4.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/codegeex4-all-9b
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/codegeex4-all-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b>`__
+- **Model ID:** zai-org/codegeex4-all-9b
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/codegeex4-all-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -37,8 +37,8 @@ Model Spec 2 (ggufv2, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** IQ2_M, IQ3_M, Q4_K_M, Q5_K_M, Q6_K_L, Q8_0
 - **Engines**: vLLM, llama.cpp
-- **Model ID:** THUDM/codegeex4-all-9b-GGUF
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/codegeex4-all-9b-GGUF>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b-GGUF>`__
+- **Model ID:** zai-org/codegeex4-all-9b-GGUF
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/codegeex4-all-9b-GGUF>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b-GGUF>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/cogagent.rst b/doc/source/models/builtin/llm/cogagent.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: Transformers
-- **Model ID:** THUDM/cogagent-9b-20241220
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/cogagent-9b-20241220>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220>`__
+- **Model ID:** zai-org/cogagent-9b-20241220
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/cogagent-9b-20241220>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/deepseek-v3-0324.rst b/doc/source/models/builtin/llm/deepseek-v3-0324.rst
@@ -45,6 +45,7 @@ chosen quantization method from the options listed above::
 
    xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format awq --quantization ${quantization}
 
+
 Model Spec 3 (mlx, 671 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
@@ -59,3 +60,4 @@ Execute the following command to launch the model, remember to replace ``${quant
 chosen quantization method from the options listed above::
 
    xinference launch --model-engine ${engine} --model-name deepseek-v3-0324 --size-in-billions 671 --model-format mlx --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/glm-4.1v-thinking.rst b/doc/source/models/builtin/llm/glm-4.1v-thinking.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/GLM-4.1V-9B-Thinking
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.1V-9B-Thinking>`__
+- **Model ID:** zai-org/GLM-4.1V-9B-Thinking
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.1V-9B-Thinking>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -37,8 +37,8 @@ Model Spec 2 (awq, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** Int4
 - **Engines**: vLLM, Transformers
-- **Model ID:** dengcao/GLM-4.1V-9B-Thinking-AWQ
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/dengcao/GLM-4.1V-9B-Thinking-AWQ>`__, `ModelScope <https://modelscope.cn/models/dengcao/GLM-4.1V-9B-Thinking-AWQ>`__
+- **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-AWQ
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/QuantTrio/GLM-4.1V-9B-Thinking-AWQ>`__, `ModelScope <https://modelscope.cn/models/tclf90/GLM-4.1V-9B-Thinking-AWQ>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -53,8 +53,8 @@ Model Spec 3 (gptq, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** Int4-Int8Mix
 - **Engines**: vLLM, Transformers
-- **Model ID:** dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__, `ModelScope <https://modelscope.cn/models/dengcao/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__
+- **Model ID:** QuantTrio/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/QuantTrio/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__, `ModelScope <https://modelscope.cn/models/tclf90/GLM-4.1V-9B-Thinking-GPTQ-Int4-Int8Mix>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/glm-4.5.rst b/doc/source/models/builtin/llm/glm-4.5.rst
@@ -0,0 +1,111 @@
+.. _models_llm_glm-4.5:
+
+========================================
+glm-4.5
+========================================
+
+- **Context Length:** 65536
+- **Model Name:** glm-4.5
+- **Languages:** en, zh
+- **Abilities:** chat, reasoning
+- **Description:** The GLM-4.5 series models are foundation models designed for intelligent agents. 
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 355 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 355
+- **Quantizations:** none
+- **Engines**: Transformers
+- **Model ID:** zai-org/GLM-4.5
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4.5>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (fp8, 355 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** fp8
+- **Model Size (in billions):** 355
+- **Quantizations:** FP8
+- **Engines**: 
+- **Model ID:** zai-org/GLM-4.5-FP8
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-FP8>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-FP8>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format fp8 --quantization ${quantization}
+
+
+Model Spec 3 (mlx, 355 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 355
+- **Quantizations:** 4bit
+- **Engines**: MLX
+- **Model ID:** mlx-community/GLM-4.5-{quantization}
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/GLM-4.5-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/GLM-4.5-{quantization}>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 355 --model-format mlx --quantization ${quantization}
+
+
+Model Spec 4 (pytorch, 106 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 106
+- **Quantizations:** none
+- **Engines**: Transformers
+- **Model ID:** zai-org/GLM-4.5-Air
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-Air>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 5 (fp8, 106 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** fp8
+- **Model Size (in billions):** 106
+- **Quantizations:** FP8
+- **Engines**: 
+- **Model ID:** zai-org/GLM-4.5-Air-FP8
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4.5-Air-FP8>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4.5-Air-FP8>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format fp8 --quantization ${quantization}
+
+
+Model Spec 6 (mlx, 106 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 106
+- **Quantizations:** 2bit, 3bit, 4bit, 5bit, 8bit
+- **Engines**: MLX
+- **Model ID:** mlx-community/GLM-4.5-Air-{quantization}
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/GLM-4.5-Air-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/GLM-4.5-Air-{quantization}>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name glm-4.5 --size-in-billions 106 --model-format mlx --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/glm-4v.rst b/doc/source/models/builtin/llm/glm-4v.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: Transformers
-- **Model ID:** THUDM/glm-4v-9b
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-4v-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4v-9b>`__
+- **Model ID:** zai-org/glm-4v-9b
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-4v-9b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4v-9b>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/glm-edge-chat.rst b/doc/source/models/builtin/llm/glm-edge-chat.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 1_5 Billion)
 - **Model Size (in billions):** 1_5
 - **Quantizations:** none
 - **Engines**: Transformers
-- **Model ID:** THUDM/glm-edge-1.5b-chat
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat>`__
+- **Model ID:** zai-org/glm-edge-1.5b-chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -37,8 +37,8 @@ Model Spec 2 (pytorch, 4 Billion)
 - **Model Size (in billions):** 4
 - **Quantizations:** none
 - **Engines**: Transformers
-- **Model ID:** THUDM/glm-edge-4b-chat
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat>`__
+- **Model ID:** zai-org/glm-edge-4b-chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -53,8 +53,8 @@ Model Spec 3 (ggufv2, 1_5 Billion)
 - **Model Size (in billions):** 1_5
 - **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
 - **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
+- **Model ID:** zai-org/glm-edge-1.5b-chat-gguf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -69,8 +69,8 @@ Model Spec 4 (ggufv2, 1_5 Billion)
 - **Model Size (in billions):** 1_5
 - **Quantizations:** F16
 - **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
+- **Model ID:** zai-org/glm-edge-1.5b-chat-gguf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -85,8 +85,8 @@ Model Spec 5 (ggufv2, 4 Billion)
 - **Model Size (in billions):** 4
 - **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
 - **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-4b-chat-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
+- **Model ID:** zai-org/glm-edge-4b-chat-gguf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -101,8 +101,8 @@ Model Spec 6 (ggufv2, 4 Billion)
 - **Model Size (in billions):** 4
 - **Quantizations:** F16
 - **Engines**: llama.cpp
-- **Model ID:** THUDM/glm-edge-4b-chat-gguf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
+- **Model ID:** zai-org/glm-edge-4b-chat-gguf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/glm4-0414.rst b/doc/source/models/builtin/llm/glm4-0414.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/GLM-4-9B-0414
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/GLM-4-9B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-9B-0414>`__
+- **Model ID:** zai-org/GLM-4-9B-0414
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4-9B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-9B-0414>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
@@ -37,8 +37,8 @@ Model Spec 2 (pytorch, 32 Billion)
 - **Model Size (in billions):** 32
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/GLM-4-32B-0414
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/GLM-4-32B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-32B-0414>`__
+- **Model ID:** zai-org/GLM-4-32B-0414
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/GLM-4-32B-0414>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/GLM-4-32B-0414>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/glm4-chat-1m.rst b/doc/source/models/builtin/llm/glm4-chat-1m.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/glm-4-9b-chat-1m-hf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf>`__
+- **Model ID:** zai-org/glm-4-9b-chat-1m-hf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-4-9b-chat-1m-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m-hf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/glm4-chat.rst b/doc/source/models/builtin/llm/glm4-chat.rst
@@ -21,8 +21,8 @@ Model Spec 1 (pytorch, 9 Billion)
 - **Model Size (in billions):** 9
 - **Quantizations:** none
 - **Engines**: vLLM, Transformers
-- **Model ID:** THUDM/glm-4-9b-chat-hf
-- **Model Hubs**:  `Hugging Face <https://huggingface.co/THUDM/glm-4-9b-chat-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf>`__
+- **Model ID:** zai-org/glm-4-9b-chat-hf
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/zai-org/glm-4-9b-chat-hf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-hf>`__
 
 Execute the following command to launch the model, remember to replace ``${quantization}`` with your
 chosen quantization method from the options listed above::
diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst
@@ -186,6 +186,11 @@ The following is a list of built-in LLM in Xinference:
      - 65536
      - GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models.
 
+   * - :ref:`glm-4.5 <models_llm_glm-4.5>`
+     - chat, reasoning
+     - 65536
+     - The GLM-4.5 series models are foundation models designed for intelligent agents. 
+
    * - :ref:`glm-4v <models_llm_glm-4v>`
      - chat, vision
      - 8192
@@ -694,6 +699,8 @@ The following is a list of built-in LLM in Xinference:
   
    glm-4.1v-thinking
   
+   glm-4.5
+  
    glm-4v
   
    glm-edge-chat
diff --git a/xinference/model/llm/llm_family.json b/xinference/model/llm/llm_family.json
diff --git a/xinference/model/llm/transformers/core.py b/xinference/model/llm/transformers/core.py
diff --git a/xinference/model/llm/vllm/core.py b/xinference/model/llm/vllm/core.py