-
Notifications
You must be signed in to change notification settings - Fork 2.4k
3. Model configuration
Coze Studio is an AI app development platform based on LLM. Before running the Coze Studio open-source version for the first time, you need to clone the project to your local machine and configure the required models. During normal project operations, you can also add new model services as needed or delete unnecessary model services at any time.
The model services supported by Coze Studio are as follows:
- Volcengine Ark | BytePlus ModelArk
- OpenAI
- DeepSeek
- Claude
- Ollama
- Qwen
- Gemini
In the open-source version of Coze Studio, the model configurations are all placed in the backend/conf/model
directory, which contains multiple YAML files, each corresponding to an accessible model.
To facilitate quick setup for developers, Coze Studio provides some template files in the backend/conf/model/template
directory, covering common model types, such as Volcengine Ark and OpenAI. Developers can find the model templates corresponding to the respective vendors, copy them to the backend/conf/model
directory, and set various parameters based on the template comments.
Before filling out the model configuration file, ensure you have understood the following Important notes:
- Ensure each model file ID is unique and that the ID is not modified after the configuration goes live. ID is the identifier for model configuration circulation within the system; modifying the model ID may lead to issues with existing agent functionality.
- Before deleting the model, ensure that it is no longer receiving online traffic.
- Agents or workflows call models based on model IDs. For models that have already been launched, do not modify their IDs; otherwise, it may result in model call failures.
Coze Studio is an AI app development platform based on LLMs. Before deploying and launching the open-source version of Coze Studio for the first time, you need to configure the model service in the Coze Studio project, otherwise, you won't be able to properly select a model during the creation of agents or workflows. When deploying and setting up the Ark model service for the first time, you can simply follow the quick start guide to complete the configuration. If you need to add more models or switch to other model services, you can refer to the following steps.
- Copy the template file.
- In the
backend/conf/template
directory, find the corresponding template yaml file based on the name of the model to be added, for example, the configuration file for OpenAI models is model_template_openai.yaml. - Copy the template file to the
backend/conf/model
directory.
- In the
- Modify the model configuration.
-
Go to the
backend/conf/model
directory and open the file copied in Step 1. -
Modify the fields in the file: id, meta.conn_config.api_key, meta.conn_config.model, and save the file.
- id: The model ID in Coze Studio, defined by the developer, must be a non-zero integer and globally unique. Agents or workflows call models based on model IDs. For models that have already been launched, do not modify their IDs; otherwise, it may result in model call failures.
- meta.conn_config.api_key: The API Key for the model service. In this example, it is the API Key for Ark API Key. For more information, see Get Volcengine Ark API Key or Get BytePlus ModelArk API Key.
- meta.conn_config.model: The Model name for the model service. In this example, it refers to the Model ID or Endpoint ID of Ark. For more information, see Get Volcengine Ark Model ID / Get Volcengine Ark Endpoint ID or Get BytePlus ModelArk Model ID / Get BytePlus ModelArk Endpoint ID.
For users in China, you may use Volcengine Ark; for users outside China, you may use BytePlus ModelArk instead.
Other parameters can remain at their default settings, or you can modify them as needed. Taking the Ark model as an example, the configuration method can refer to the Volcengine Ark Model List.
-
After modifying the configuration file, execute the following commands to restart the service and apply the configurations.
docker compose --profile "*" restart coze-server
After the service starts successfully, open the agent editing page and select the configured model from the model dropdown list.
Coze Studio supports various common model services and model protocols. You can configure the corresponding template files, protocol, and base_url according to the type of model service. For example, for third - party models such as Volcengine Ark, you can refer to the [Third - Party Model Service] section to configure the corresponding template files; for official model services such as OpenAI, you can refer to the [Official Model Service] section. All model service template files are located in /backend/conf/model/template
. You need to copy them to /backend/conf/model
and then modify them. The changes will take effect after the service is restarted.
In addition, the base_url is common for [Embedding Configuration] in [Basic Component Configuration].
Platform | Base Template File Name | Protocol | Base_url | Special Instructions |
---|---|---|---|---|
Volcengine Ark | model_template_ark.yaml | ark | Volcengine Engine: https://ark.cn-beijing.volces.com/api/v3/ Overseas BytePlus: https://ark.ap-southeast.bytepluses.com/api/v3/ |
None |
Alibaba Bai Lian | model_template_openai.yaml or model_template_qwen.yaml |
openai or qwen |
https://dashscope.aliyuncs.com/compatible-mode/v1 | The qwen3 series does not support thinking in non-streaming calls. If used, you need to set enable_thinking: false in conn_config. Coze Studio will adapt this capability in future versions. |
Silicon-based Flow | model_template_openai.yaml | openai | https://api.siliconflow.cn/v1 | None |
Other Third-Party API Relay | model_template_openai.yaml | openai | The address provided in the API document Note that the path usually has a /v1 suffix and does not have a /chat/completions suffix |
If the platform only relays or proxies model services and the model is not an openai model, please configure the protocol according to the documentation in the [Official Model Service] section. |
Framework | Base Template File Name | Protocol | Base_url | Special Instructions |
---|---|---|---|---|
Ollama | model_template_ollama.yaml | ollama | http://${ip}:11434 | 1. When the mirror network mode is bridge, localhost in the coze-server mirror is not the localhost of the host. It needs to be modified to the ip of the Ollama deployment machine or http://host.docker.internal:11434 . 2. Check the api_key: When the api_key is not set, this parameter is left blank. 3. Confirm whether the firewall of the Ollama-deployed host has opened port 11434. 4. Confirm that the Ollama network has enabled External Exposure. |
vllm | model_template_openai.yaml | openai | http://${ip}:8000/v1 (specified when starting the port) | None |
xinference | model_template_openai.yaml | openai | http://${ip}:9997/v1 (specified when starting the port) | None |
sglang | model_template_openai.yaml | openai | http://${ip}:35140/v1 (specified when starting the port) | None |
LMStudio | model_template_openai.yaml | openai | http://${ip}:${port}/v1 | None |
Model | Base Template File Name | Protocol | Base_url | Special Instructions |
---|---|---|---|---|
Doubao | model_template_ark.yaml | ark | https://ark.cn-beijing.volces.com/api/v3/ | None |
OpenAI | model_template_openai.yaml | openai | https://api.openai.com/v1 | Check the by_azure field configuration. If the model is a model service provided by Microsoft Azure, this parameter should be set to true. |
Deepseek | model_template_deepseek.yaml | deepseek | https://api.deepseek.com/ | None |
Qwen | model_template_qwen.yaml | qwen | https://dashscope.aliyuncs.com/compatible-mode/v1 | The qwen3 series does not support thinking in non-streaming calls. If used, you need to set enable_thinking: false in conn_config. Coze Studio will adapt this capability in future versions. |
Gemini | model_template_gemini.yaml | gemini | https://generativelanguage.googleapis.com/ | None |
Claude | model_template_claude.yaml | claude | https://api.anthropic.com/v1/ | None |
The model meta information file describes the foundational capabilities and connection details of the model.
- The basic model meta information template can be found at backend/conf/model/template/model_template_basic.yaml.
- The meta information templates for each model can be found under backend/conf/model/template, corresponding to the respective model names.
Below is the complete list of model configuration fields, which includes descriptions of each field:
Field name | Required | Example | Parameter description |
---|---|---|---|
id | Required | 0 | Model id |
name | Required | test_model | Model platform display name |
icon_uri | Optional | test_icon_uri | LLM showcase image uri |
icon_url | Optional | test_icon_url | LLM showcase image url |
description | Optional | - | Default LLM description |
description.zh | Optional | This is the model description information | Chinese version model description, used for platform display |
description.en | Optional | This is model description | English version model description, used for platform display |
default_parameters | Optional | - | Model parameter list The elements in the list can collectively refer to the template file |
default_parameters.name | Required | temperature | Model parameter name, enumerated values: temperature, top_p, top_k, max_tokens, response_format, frequency_penalty, presence_penalty |
default_parameters.label | Required | - | Model parameter platform display name |
default_parameters.label.zh | Required | Generate randomness | Model parameter platform display name - Chinese |
default_parameters.label.en | Required | Temperature | Model parameter platform display name - English |
default_parameters.desc | Required | - | Model parameter platform display description |
default_parameters.desc.zh | Required | temperature: Increasing the temperature will make the model's output more diverse and innovative | Model parameter platform display description - Chinese |
default_parameters.desc.en | Required | Temperature: When you increase this value, the model outputs more diverse and innovative content | Model parameter platform display description - English |
default_parameters.type | Required | int | Field value type, enumeration value: int,float,boolean,string |
default_parameters.min | Optional | '0' | (For numeric types) Minimum field value |
default_parameters.max | Optional | '1' | (For numeric types) Maximum field value |
default_parameters.default_val | Required | - | Precise / Balanced / Creative / Default values under custom mode |
default_parameters.default_val.default_val | Required | '1.0' | Default value in custom mode |
default_parameters.default_val.creative | Optional | '1.0' | Default value in creative mode |
default_parameters.default_val.balance | Optional | '0.8' | Default value in balanced mode |
default_parameters.default_val.precise | Optional | '0.3' | Default value in precise mode |
default_parameters.precision | Optional | 2 | Precision (when type is float) |
default_parameters.style | Required | - | Display type |
default_parameters.style.widget | Required | slider | Display style, enumerated values: slider (slider) radio_button (button) |
default_parameters.style.label | Required | - | Classification |
default_parameters.style.label.zh | Required | Generation diversity | Classification label name - Chinese |
default_parameters.style.label.en | Required | Generation diversity | Classification tag name-English |
meta | Required | - | Model metadata |
meta.name | Required | test_model_name | Model name, used for record keeping, not displayed |
meta.protocol | Required | test_protocol | Model connection protocol |
meta.capability | Required | - | Model foundational capabilities |
meta.capability.function_call | Optional | true | Does the model support function call |
meta.capability.input_modal | Optional | ["text", "image", "audio", "video"] | Model input supports modality |
meta.capability.input_tokens | Optional | 1024 | Input token limit |
meta.capability.output_modal | Optional | ["text", "image", "audio", "video"] | The model output supports modalities |
meta.capability.output_tokens | Optional | 1024 | Output token limit |
meta.capability.max_tokens | Optional | 2048 | Maximum token count |
meta.capability.json_mode | Optional | true | Does it support JSON mode |
meta.capability.prefix_caching | Optional | false | Does it support prefix caching |
meta.capability.reasoning | Optional | false | Does it support reasoning |
meta.conn_config | Required | - | Model connection parameters |
meta.conn_config.base_url | Required | https://localhost:1234/chat/completion | Model service base URL |
meta.conn_config.api_key | Required | qweasdzxc | API token |
meta.conn_config.timeout | Optional | 100 | Timeout duration (nanoseconds) |
meta.conn_config.model | Required | model_name | Model name. |
meta.conn_config.temperature | Optional | 0.7 | Default temperature |
meta.conn_config.frequency_penalty | Optional | 0 | Default frequency_penalty |
meta.conn_config.presence_penalty | Optional | 0 | Default presence_penalty |
meta.conn_config.max_tokens | Optional | 2048 | Default max_tokens |
meta.conn_config.top_p | Optional | 0 | Default top_p |
meta.conn_config.top_k | Optional | 0 | Default top_k |
meta.conn_config.enable_thinking | Optional | false | Enable thinking process or not |
meta.conn_config.stop | Optional | ["bye"] | Stop word list |
meta.conn_config.openai | Optional | - | OpenAI-specific configuration |
meta.conn_config.openai.by_azure | Optional | true | Whether to use Azure |
meta.conn_config.openai.api_version | Optional | 2024-10-21 | API version |
meta.conn_config.openai.response_format.type | Optional | text | Response format type |
meta.conn_config.claude | Optional | - | Claude-specific configuration |
meta.conn_config.claude.by_bedrock | Optional | true | Is Bedrock used |
meta.conn_config.claude.access_key | Optional | bedrock_ak | Bedrock access token |
meta.conn_config.claude.secret_access_key | Optional | bedrock_secret_ak | Bedrock token |
meta.conn_config.claude.session_token | Optional | bedrock_session_token | Bedrock conversation token |
meta.conn_config.claude.region | Optional | bedrock_region | Bedrock region |
meta.conn_config.ark | Optional | - | Ark dedicated configuration |
meta.conn_config.ark.region | Optional | region | Section |
meta.conn_config.ark.access_key | Optional | ak | Access token |
meta.conn_config.ark.secret_key | Optional | sk | Token |
meta.conn_config.ark.retry_times | Optional | 123 | Retry times |
meta.conn_config.ark.custom_header | Optional | {"key": "val"} | Custom request headers |
meta.conn_config.deepseek | Optional | - | Deepseek-specific configuration |
meta.conn_config.deepseek.response_format_type | Optional | text | Response format type |
meta.conn_config.qwen | Optional | - | Qwen-specific configuration |
meta.conn_config.qwen.response_format | Optional | - | Return format |
meta.conn_config.gemini | Optional | - | Gemini-specific configuration |
meta.conn_config.gemini.backend | Optional | 0 | Gemini backend configuration * 0: Default * 1:GeminiAPI * 2:VertexAI |
meta.conn_config.gemini.project | Optional | test_project | GCP Project ID for Vertex AI Required when backend=2 |
meta.conn_config.gemini.location | Optional | test_loc | GCP Location/Region for Vertex AI. Required when backend=2 |
meta.conn_config.gemini.api_version | Optional | v1beta | API version |
meta.conn_config.headers | Optional | - | http headers |
meta.conn_config.timeout_ms | Optional | - | http timeout |
meta.conn_config.include_thoughts | Optional | true | Return whether it contains thinking content |
meta.conn_config.thinking_budget | Optional | 123 | Thinking token consumption budget |
meta.status | Optional | 1 | Model status * 0: Default state when not configured, equivalent to 1 * 1: In the app, it can be used and newly created * 5: Pending offline, usable but cannot be newly created * 10: Offline, neither usable nor newly created |
The complete configuration examples and field descriptions for each model can be referenced at backend/conf/model/template/model_template_basic.yaml. You can also refer to the following minimal configuration. Most field settings are generally similar, with major differences in protocol and conn_config.
id: 2002
name: Doubao Model
icon_uri: doubao_v2.png
icon_url: ""
description:
zh: 豆包模型简介
en: doubao model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
- name: top_p
label:
zh: Top P
en: Top P
desc:
zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
type: float
min: "0"
max: "1"
default_val:
default_val: "0.7"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: response_format
label:
zh: 输出格式
en: Response format
desc:
zh: '- **文本**: 使用普通文本格式回复\n- **Markdown**: 将引导模型使用Markdown格式输出回复\n- **JSON**: 将引导模型使用JSON格式输出'
en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
type: int
min: ""
max: ""
default_val:
default_val: "0"
options:
- label: Text
value: "0"
- label: Markdown
value: "1"
- label: JSON
value: "2"
style:
widget: radio_buttons
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: Doubao
protocol: ark
capability:
function_call: true
input_modal:
- text
- image
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.1
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 0.7
top_k: 0
stop: []
openai: null
claude: null
ark:
region: ""
access_key: ""
secret_key: ""
retry_times: null
custom_header: {}
deepseek: null
qwen: null
gemini: null
custom: {}
status: 0
id: 2006
name: Claude-3.5-Sonnet
icon_uri: claude_v2.png
icon_url: ""
description:
zh: claude 模型简介
en: claude model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: Claude-3.5-Sonnet
protocol: claude
capability:
function_call: true
input_modal:
- text
- image
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 1
top_k: 0
stop: []
openai: null
claude:
by_bedrock: false
access_key: ""
secret_access_key: ""
session_token: ""
region: ""
ark: null
deepseek: null
qwen: null
gemini: null
custom: {}
status: 0
id: 2004
name: DeepSeek-V3
icon_uri: deepseek_v2.png
icon_url: ""
description:
zh: deepseek 模型简介
en: deepseek model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成随机性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
- name: response_format
label:
zh: 输出格式
en: Response format
desc:
zh: '- **文本**: 使用普通文本格式回复\n- **JSON**: 将引导模型使用JSON格式输出'
en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
type: int
min: ""
max: ""
default_val:
default_val: "0"
options:
- label: Text
value: "0"
- label: JSON Object
value: "1"
style:
widget: radio_buttons
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: DeepSeek-V3
protocol: deepseek
capability:
function_call: false
input_modal:
- text
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 1
top_k: 0
stop: []
openai: null
claude: null
ark: null
deepseek:
response_format_type: text
qwen: null
gemini: null
custom: {}
status: 0
id: 2003
name: Gemma-3
icon_uri: ollama.png
icon_url: ""
description:
zh: ollama 模型简介
en: ollama model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: Gemma-3
protocol: ollama
capability:
function_call: true
input_modal:
- text
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.6
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 0.95
top_k: 20
stop: []
openai: null
claude: null
ark: null
deepseek: null
qwen: null
gemini: null
custom: {}
status: 0
id: 2001
name: GPT-4o
icon_uri: openai_v2.png
icon_url: ""
description:
zh: gpt 模型简介
en: Multi-modal, 320ms, 88.7% MMLU, excels in education, customer support, health, and entertainment.
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
- name: top_p
label:
zh: Top P
en: Top P
desc:
zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
type: float
min: "0"
max: "1"
default_val:
default_val: "0.7"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: frequency_penalty
label:
zh: 重复语句惩罚
en: Frequency penalty
desc:
zh: '- **frequency penalty**: 当该值为正时,会阻止模型频繁使用相同的词汇和短语,从而增加输出内容的多样性。'
en: '**Frequency Penalty**: When positive, it discourages the model from repeating the same words and phrases, thereby increasing the diversity of the output.'
type: float
min: "-2"
max: "2"
default_val:
default_val: "0"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: presence_penalty
label:
zh: 重复主题惩罚
en: Presence penalty
desc:
zh: '- **presence penalty**: 当该值为正时,会阻止模型频繁讨论相同的主题,从而增加输出内容的多样性'
en: '**Presence Penalty**: When positive, it prevents the model from discussing the same topics repeatedly, thereby increasing the diversity of the output.'
type: float
min: "-2"
max: "2"
default_val:
default_val: "0"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: response_format
label:
zh: 输出格式
en: Response format
desc:
zh: '- **文本**: 使用普通文本格式回复\n- **Markdown**: 将引导模型使用Markdown格式输出回复\n- **JSON**: 将引导模型使用JSON格式输出'
en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
type: int
min: ""
max: ""
default_val:
default_val: "0"
options:
- label: Text
value: "0"
- label: Markdown
value: "1"
- label: JSON
value: "2"
style:
widget: radio_buttons
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: GPT-4o
protocol: openai
capability:
function_call: true
input_modal:
- text
- image
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 1
top_k: 0
stop: []
openai:
by_azure: true
api_version: ""
response_format:
type: text
jsonschema: null
claude: null
ark: null
deepseek: null
qwen: null
gemini: null
custom: {}
status: 0
id: 2005
name: Qwen3-32B
icon_uri: qwen_v2.png
icon_url: ""
description:
zh: 通义千问模型
en: qwen model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
- name: top_p
label:
zh: Top P
en: Top P
desc:
zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
type: float
min: "0"
max: "1"
default_val:
default_val: "0.95"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
meta:
name: Qwen3-32B
protocol: qwen
capability:
function_call: true
input_modal:
- text
input_tokens: 128000
json_mode: false
max_tokens: 128000
output_modal:
- text
output_tokens: 16384
prefix_caching: false
reasoning: false
prefill_response: false
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: ""
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 1
top_k: 0
stop: []
openai: null
claude: null
ark: null
deepseek: null
qwen:
response_format:
type: text
jsonschema: null
gemini: null
custom: {}
status: 0
id: 2007
name: Gemini-2.5-Flash
icon_uri: gemini_v2.png
icon_url: ""
description:
zh: gemini 模型简介
en: gemini model description
default_parameters:
- name: temperature
label:
zh: 生成随机性
en: Temperature
desc:
zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
type: float
min: "0"
max: "1"
default_val:
balance: "0.8"
creative: "1"
default_val: "1.0"
precise: "0.3"
precision: 1
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: max_tokens
label:
zh: 最大回复长度
en: Response max length
desc:
zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
type: int
min: "1"
max: "4096"
default_val:
default_val: "4096"
options: []
style:
widget: slider
label:
zh: 输入及输出设置
en: Input and output settings
- name: top_p
label:
zh: Top P
en: Top P
desc:
zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
type: float
min: "0"
max: "1"
default_val:
default_val: "0.7"
precision: 2
options: []
style:
widget: slider
label:
zh: 生成多样性
en: Generation diversity
- name: response_format
label:
zh: 输出格式
en: Response format
desc:
zh: '- **文本**: 使用普通文本格式回复\n- **JSON**: 将引导模型使用JSON格式输出'
en: '**Response Format**:\n\n- **JSON**: Uses JSON format for replies'
type: int
min: ""
max: ""
default_val:
default_val: "0"
options:
- label: Text
value: "0"
- label: JSON
value: "2"
style:
widget: radio_buttons
label:
zh: 输入及输出设置
en: Input and output settings
meta:
name: Gemini-2.5-Flash
protocol: gemini
capability:
function_call: true
input_modal:
- text
- image
- audio
- video
input_tokens: 1048576
json_mode: true
max_tokens: 1114112
output_modal:
- text
output_tokens: 65536
prefix_caching: true
reasoning: true
prefill_response: true
conn_config:
base_url: ""
api_key: ""
timeout: 0s
model: gemini-2.5-flash
temperature: 0.7
frequency_penalty: 0
presence_penalty: 0
max_tokens: 4096
top_p: 1
top_k: 0
stop: []
openai: null
claude: null
ark: null
deepseek: null
qwen: null
gemini:
backend: 0
project: ""
location: ""
api_version: ""
headers:
key_1:
- val_1
- val_2
timeout_ms: 0
include_thoughts: true
thinking_budget: null
custom: {}
status: 0