Skip to content

3. Model configuration

newacc edited this page Aug 8, 2025 · 8 revisions

Coze Studio is an AI app development platform based on LLM. Before running the Coze Studio open-source version for the first time, you need to clone the project to your local machine and configure the required models. During normal project operations, you can also add new model services as needed or delete unnecessary model services at any time.

Model list

The model services supported by Coze Studio are as follows:

  • Volcengine Ark | BytePlus ModelArk
  • OpenAI
  • DeepSeek
  • Claude
  • Ollama
  • Qwen
  • Gemini

Model configuration instructions

In the open-source version of Coze Studio, the model configurations are all placed in the backend/conf/model directory, which contains multiple YAML files, each corresponding to an accessible model. To facilitate quick setup for developers, Coze Studio provides some template files in the backend/conf/model/template directory, covering common model types, such as Volcengine Ark and OpenAI. Developers can find the model templates corresponding to the respective vendors, copy them to the backend/conf/model directory, and set various parameters based on the template comments.

Important notes

Before filling out the model configuration file, ensure you have understood the following Important notes:

  • Ensure each model file ID is unique and that the ID is not modified after the configuration goes live. ID is the identifier for model configuration circulation within the system; modifying the model ID may lead to issues with existing agent functionality.
  • Before deleting the model, ensure that it is no longer receiving online traffic.
  • Agents or workflows call models based on model IDs. For models that have already been launched, do not modify their IDs; otherwise, it may result in model call failures.

Configure the model for Coze Studio.

Coze Studio is an AI app development platform based on LLMs. Before deploying and launching the open-source version of Coze Studio for the first time, you need to configure the model service in the Coze Studio project, otherwise, you won't be able to properly select a model during the creation of agents or workflows. When deploying and setting up the Ark model service for the first time, you can simply follow the quick start guide to complete the configuration. If you need to add more models or switch to other model services, you can refer to the following steps.

Step 1: Modify the model configuration file.

  1. Copy the template file.
    1. In the backend/conf/template directory, find the corresponding template yaml file based on the name of the model to be added, for example, the configuration file for OpenAI models is model_template_openai.yaml.
    2. Copy the template file to the backend/conf/model directory.
  2. Modify the model configuration.
    1. Go to the backend/conf/model directory and open the file copied in Step 1.

    2. Modify the fields in the file: id, meta.conn_config.api_key, meta.conn_config.model, and save the file.

      For users in China, you may use Volcengine Ark; for users outside China, you may use BytePlus ModelArk instead.

      Other parameters can remain at their default settings, or you can modify them as needed. Taking the Ark model as an example, the configuration method can refer to the Volcengine Ark Model List. Image Image

Step 2: Restart the service

After modifying the configuration file, execute the following commands to restart the service and apply the configurations.

docker compose --profile "*" restart coze-server

After the service starts successfully, open the agent editing page and select the configured model from the model dropdown list.

Image

Configuration Reference

Coze Studio supports various common model services and model protocols. You can configure the corresponding template files, protocol, and base_url according to the type of model service. For example, for third - party models such as Volcengine Ark, you can refer to the [Third - Party Model Service] section to configure the corresponding template files; for official model services such as OpenAI, you can refer to the [Official Model Service] section. All model service template files are located in /backend/conf/model/template. You need to copy them to /backend/conf/model and then modify them. The changes will take effect after the service is restarted.

In addition, the base_url is common for [Embedding Configuration] in [Basic Component Configuration].

Third-Party Model Service

Platform Base Template File Name Protocol Base_url Special Instructions
Volcengine Ark model_template_ark.yaml ark Volcengine Engine: https://ark.cn-beijing.volces.com/api/v3/
Overseas BytePlus: https://ark.ap-southeast.bytepluses.com/api/v3/
None
Alibaba Bai Lian model_template_openai.yaml or
model_template_qwen.yaml
openai or
qwen
https://dashscope.aliyuncs.com/compatible-mode/v1 The qwen3 series does not support thinking in non-streaming calls. If used, you need to set enable_thinking: false in conn_config. Coze Studio will adapt this capability in future versions.
Silicon-based Flow model_template_openai.yaml openai https://api.siliconflow.cn/v1 None
Other Third-Party API Relay model_template_openai.yaml openai The address provided in the API document
Note that the path usually has a /v1 suffix and does not have a /chat/completions suffix
If the platform only relays or proxies model services and the model is not an openai model, please configure the protocol according to the documentation in the [Official Model Service] section.

Open-Source Framework

Framework Base Template File Name Protocol Base_url Special Instructions
Ollama model_template_ollama.yaml ollama http://${ip}:11434 1. When the mirror network mode is bridge, localhost in the coze-server mirror is not the localhost of the host. It needs to be modified to the ip of the Ollama deployment machine or http://host.docker.internal:11434.
2. Check the api_key: When the api_key is not set, this parameter is left blank.
3. Confirm whether the firewall of the Ollama-deployed host has opened port 11434.
4. Confirm that the Ollama network has enabled External Exposure.

Image
vllm model_template_openai.yaml openai http://${ip}:8000/v1 (specified when starting the port) None
xinference model_template_openai.yaml openai http://${ip}:9997/v1 (specified when starting the port) None
sglang model_template_openai.yaml openai http://${ip}:35140/v1 (specified when starting the port) None
LMStudio model_template_openai.yaml openai http://${ip}:${port}/v1 None

Official Model Service

Model Base Template File Name Protocol Base_url Special Instructions
Doubao model_template_ark.yaml ark https://ark.cn-beijing.volces.com/api/v3/ None
OpenAI model_template_openai.yaml openai https://api.openai.com/v1 Check the by_azure field configuration. If the model is a model service provided by Microsoft Azure, this parameter should be set to true.
Deepseek model_template_deepseek.yaml deepseek https://api.deepseek.com/ None
Qwen model_template_qwen.yaml qwen https://dashscope.aliyuncs.com/compatible-mode/v1 The qwen3 series does not support thinking in non-streaming calls. If used, you need to set enable_thinking: false in conn_config. Coze Studio will adapt this capability in future versions.
Gemini model_template_gemini.yaml gemini https://generativelanguage.googleapis.com/ None
Claude model_template_claude.yaml claude https://api.anthropic.com/v1/ None

Field Description

Model information

The model meta information file describes the foundational capabilities and connection details of the model.

Below is the complete list of model configuration fields, which includes descriptions of each field:

Field name Required Example Parameter description
id Required 0 Model id
name Required test_model Model platform display name
icon_uri Optional test_icon_uri LLM showcase image uri
icon_url Optional test_icon_url LLM showcase image url
description Optional - Default LLM description
description.zh Optional This is the model description information Chinese version model description, used for platform display
description.en Optional This is model description English version model description, used for platform display
default_parameters Optional - Model parameter list
The elements in the list can collectively refer to the template file
default_parameters.name Required temperature Model parameter name, enumerated values:
temperature, top_p, top_k, max_tokens, response_format, frequency_penalty, presence_penalty
default_parameters.label Required - Model parameter platform display name
default_parameters.label.zh Required Generate randomness Model parameter platform display name - Chinese
default_parameters.label.en Required Temperature Model parameter platform display name - English
default_parameters.desc Required - Model parameter platform display description
default_parameters.desc.zh Required temperature: Increasing the temperature will make the model's output more diverse and innovative Model parameter platform display description - Chinese
default_parameters.desc.en Required Temperature: When you increase this value, the model outputs more diverse and innovative content Model parameter platform display description - English
default_parameters.type Required int Field value type, enumeration value:
int,float,boolean,string
default_parameters.min Optional '0' (For numeric types) Minimum field value
default_parameters.max Optional '1' (For numeric types) Maximum field value
default_parameters.default_val Required - Precise / Balanced / Creative / Default values under custom mode
default_parameters.default_val.default_val Required '1.0' Default value in custom mode
default_parameters.default_val.creative Optional '1.0' Default value in creative mode
default_parameters.default_val.balance Optional '0.8' Default value in balanced mode
default_parameters.default_val.precise Optional '0.3' Default value in precise mode
default_parameters.precision Optional 2 Precision (when type is float)
default_parameters.style Required - Display type
default_parameters.style.widget Required slider Display style, enumerated values:
slider (slider)
radio_button (button)
default_parameters.style.label Required - Classification
default_parameters.style.label.zh Required Generation diversity Classification label name - Chinese
default_parameters.style.label.en Required Generation diversity Classification tag name-English
meta Required - Model metadata
meta.name Required test_model_name Model name, used for record keeping, not displayed
meta.protocol Required test_protocol Model connection protocol
meta.capability Required - Model foundational capabilities
meta.capability.function_call Optional true Does the model support function call
meta.capability.input_modal Optional ["text", "image", "audio", "video"] Model input supports modality
meta.capability.input_tokens Optional 1024 Input token limit
meta.capability.output_modal Optional ["text", "image", "audio", "video"] The model output supports modalities
meta.capability.output_tokens Optional 1024 Output token limit
meta.capability.max_tokens Optional 2048 Maximum token count
meta.capability.json_mode Optional true Does it support JSON mode
meta.capability.prefix_caching Optional false Does it support prefix caching
meta.capability.reasoning Optional false Does it support reasoning
meta.conn_config Required - Model connection parameters
meta.conn_config.base_url Required https://localhost:1234/chat/completion Model service base URL
meta.conn_config.api_key Required qweasdzxc API token
meta.conn_config.timeout Optional 100 Timeout duration (nanoseconds)
meta.conn_config.model Required model_name Model name.
meta.conn_config.temperature Optional 0.7 Default temperature
meta.conn_config.frequency_penalty Optional 0 Default frequency_penalty
meta.conn_config.presence_penalty Optional 0 Default presence_penalty
meta.conn_config.max_tokens Optional 2048 Default max_tokens
meta.conn_config.top_p Optional 0 Default top_p
meta.conn_config.top_k Optional 0 Default top_k
meta.conn_config.enable_thinking Optional false Enable thinking process or not
meta.conn_config.stop Optional ["bye"] Stop word list
meta.conn_config.openai Optional - OpenAI-specific configuration
meta.conn_config.openai.by_azure Optional true Whether to use Azure
meta.conn_config.openai.api_version Optional 2024-10-21 API version
meta.conn_config.openai.response_format.type Optional text Response format type
meta.conn_config.claude Optional - Claude-specific configuration
meta.conn_config.claude.by_bedrock Optional true Is Bedrock used
meta.conn_config.claude.access_key Optional bedrock_ak Bedrock access token
meta.conn_config.claude.secret_access_key Optional bedrock_secret_ak Bedrock token
meta.conn_config.claude.session_token Optional bedrock_session_token Bedrock conversation token
meta.conn_config.claude.region Optional bedrock_region Bedrock region
meta.conn_config.ark Optional - Ark dedicated configuration
meta.conn_config.ark.region Optional region Section
meta.conn_config.ark.access_key Optional ak Access token
meta.conn_config.ark.secret_key Optional sk Token
meta.conn_config.ark.retry_times Optional 123 Retry times
meta.conn_config.ark.custom_header Optional {"key": "val"} Custom request headers
meta.conn_config.deepseek Optional - Deepseek-specific configuration
meta.conn_config.deepseek.response_format_type Optional text Response format type
meta.conn_config.qwen Optional - Qwen-specific configuration
meta.conn_config.qwen.response_format Optional - Return format
meta.conn_config.gemini Optional - Gemini-specific configuration
meta.conn_config.gemini.backend Optional 0 Gemini backend configuration

* 0: Default
* 1:GeminiAPI
* 2:VertexAI
meta.conn_config.gemini.project Optional test_project GCP Project ID for Vertex AI
Required when backend=2
meta.conn_config.gemini.location Optional test_loc GCP Location/Region for Vertex AI.
Required when backend=2
meta.conn_config.gemini.api_version Optional v1beta API version
meta.conn_config.headers Optional - http headers
meta.conn_config.timeout_ms Optional - http timeout
meta.conn_config.include_thoughts Optional true Return whether it contains thinking content
meta.conn_config.thinking_budget Optional 123 Thinking token consumption budget
meta.status Optional 1 Model status

* 0: Default state when not configured, equivalent to 1
* 1: In the app, it can be used and newly created
* 5: Pending offline, usable but cannot be newly created
* 10: Offline, neither usable nor newly created

Example

The complete configuration examples and field descriptions for each model can be referenced at backend/conf/model/template/model_template_basic.yaml. You can also refer to the following minimal configuration. Most field settings are generally similar, with major differences in protocol and conn_config.

Volcengine Ark

id: 2002
name: Doubao Model
icon_uri: doubao_v2.png
icon_url: ""
description:
    zh: 豆包模型简介
    en: doubao model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
    - name: top_p
      label:
        zh: Top P
        en: Top P
      desc:
        zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
        en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
      type: float
      min: "0"
      max: "1"
      default_val:
        default_val: "0.7"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: response_format
      label:
        zh: 输出格式
        en: Response format
      desc:
        zh: '- **文本**: 使用普通文本格式回复\n- **Markdown**: 将引导模型使用Markdown格式输出回复\n- **JSON**: 将引导模型使用JSON格式输出'
        en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
      type: int
      min: ""
      max: ""
      default_val:
        default_val: "0"
      options:
        - label: Text
          value: "0"
        - label: Markdown
          value: "1"
        - label: JSON
          value: "2"
      style:
        widget: radio_buttons
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: Doubao
    protocol: ark
    capability:
        function_call: true
        input_modal:
            - text
            - image
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.1
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 0.7
        top_k: 0
        stop: []
        openai: null
        claude: null
        ark:
            region: ""
            access_key: ""
            secret_key: ""
            retry_times: null
            custom_header: {}
        deepseek: null
        qwen: null
        gemini: null
        custom: {}
    status: 0

Claude

id: 2006
name: Claude-3.5-Sonnet
icon_uri: claude_v2.png
icon_url: ""
description:
    zh: claude 模型简介
    en: claude model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: Claude-3.5-Sonnet
    protocol: claude
    capability:
        function_call: true
        input_modal:
            - text
            - image
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.7
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 1
        top_k: 0
        stop: []
        openai: null
        claude:
            by_bedrock: false
            access_key: ""
            secret_access_key: ""
            session_token: ""
            region: ""
        ark: null
        deepseek: null
        qwen: null
        gemini: null
        custom: {}
    status: 0

Deepseek

id: 2004
name: DeepSeek-V3
icon_uri: deepseek_v2.png
icon_url: ""
description:
    zh: deepseek 模型简介
    en: deepseek model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成随机性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
    - name: response_format
      label:
        zh: 输出格式
        en: Response format
      desc:
        zh: '- **文本**: 使用普通文本格式回复\n- **JSON**: 将引导模型使用JSON格式输出'
        en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
      type: int
      min: ""
      max: ""
      default_val:
        default_val: "0"
      options:
        - label: Text
          value: "0"
        - label: JSON Object
          value: "1"
      style:
        widget: radio_buttons
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: DeepSeek-V3
    protocol: deepseek
    capability:
        function_call: false
        input_modal:
            - text
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.7
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 1
        top_k: 0
        stop: []
        openai: null
        claude: null
        ark: null
        deepseek:
            response_format_type: text
        qwen: null
        gemini: null
        custom: {}
    status: 0

Ollama

id: 2003
name: Gemma-3
icon_uri: ollama.png
icon_url: ""
description:
    zh: ollama 模型简介
    en: ollama model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: Gemma-3
    protocol: ollama
    capability:
        function_call: true
        input_modal:
            - text
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.6
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 0.95
        top_k: 20
        stop: []
        openai: null
        claude: null
        ark: null
        deepseek: null
        qwen: null
        gemini: null
        custom: {}
    status: 0

OpenAI

id: 2001
name: GPT-4o
icon_uri: openai_v2.png
icon_url: ""
description:
    zh: gpt 模型简介
    en: Multi-modal, 320ms, 88.7% MMLU, excels in education, customer support, health, and entertainment.
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
    - name: top_p
      label:
        zh: Top P
        en: Top P
      desc:
        zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
        en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
      type: float
      min: "0"
      max: "1"
      default_val:
        default_val: "0.7"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: frequency_penalty
      label:
        zh: 重复语句惩罚
        en: Frequency penalty
      desc:
        zh: '- **frequency penalty**: 当该值为正时,会阻止模型频繁使用相同的词汇和短语,从而增加输出内容的多样性。'
        en: '**Frequency Penalty**: When positive, it discourages the model from repeating the same words and phrases, thereby increasing the diversity of the output.'
      type: float
      min: "-2"
      max: "2"
      default_val:
        default_val: "0"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: presence_penalty
      label:
        zh: 重复主题惩罚
        en: Presence penalty
      desc:
        zh: '- **presence penalty**: 当该值为正时,会阻止模型频繁讨论相同的主题,从而增加输出内容的多样性'
        en: '**Presence Penalty**: When positive, it prevents the model from discussing the same topics repeatedly, thereby increasing the diversity of the output.'
      type: float
      min: "-2"
      max: "2"
      default_val:
        default_val: "0"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: response_format
      label:
        zh: 输出格式
        en: Response format
      desc:
        zh: '- **文本**: 使用普通文本格式回复\n- **Markdown**: 将引导模型使用Markdown格式输出回复\n- **JSON**: 将引导模型使用JSON格式输出'
        en: '**Response Format**:\n\n- **Text**: Replies in plain text format\n- **Markdown**: Uses Markdown format for replies\n- **JSON**: Uses JSON format for replies'
      type: int
      min: ""
      max: ""
      default_val:
        default_val: "0"
      options:
        - label: Text
          value: "0"
        - label: Markdown
          value: "1"
        - label: JSON
          value: "2"
      style:
        widget: radio_buttons
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: GPT-4o
    protocol: openai
    capability:
        function_call: true
        input_modal:
            - text
            - image
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.7
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 1
        top_k: 0
        stop: []
        openai:
            by_azure: true
            api_version: ""
            response_format:
                type: text
                jsonschema: null
        claude: null
        ark: null
        deepseek: null
        qwen: null
        gemini: null
        custom: {}
    status: 0

Qwen

id: 2005
name: Qwen3-32B
icon_uri: qwen_v2.png
icon_url: ""
description:
    zh: 通义千问模型
    en: qwen model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
    - name: top_p
      label:
        zh: Top P
        en: Top P
      desc:
        zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
        en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
      type: float
      min: "0"
      max: "1"
      default_val:
        default_val: "0.95"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
meta:
    name: Qwen3-32B
    protocol: qwen
    capability:
        function_call: true
        input_modal:
            - text
        input_tokens: 128000
        json_mode: false
        max_tokens: 128000
        output_modal:
            - text
        output_tokens: 16384
        prefix_caching: false
        reasoning: false
        prefill_response: false
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: ""
        temperature: 0.7
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 1
        top_k: 0
        stop: []
        openai: null
        claude: null
        ark: null
        deepseek: null
        qwen:
            response_format:
                type: text
                jsonschema: null
        gemini: null
        custom: {}
    status: 0

Gemini

id: 2007
name: Gemini-2.5-Flash
icon_uri: gemini_v2.png
icon_url: ""
description:
    zh: gemini 模型简介
    en: gemini model description
default_parameters:
    - name: temperature
      label:
        zh: 生成随机性
        en: Temperature
      desc:
        zh: '- **temperature**: 调高温度会使得模型的输出更多样性和创新性,反之,降低温度会使输出内容更加遵循指令要求但减少多样性。建议不要与“Top p”同时调整。'
        en: '**Temperature**:\n\n- When you increase this value, the model outputs more diverse and innovative content; when you decrease it, the model outputs less diverse content that strictly follows the given instructions.\n- It is recommended not to adjust this value with \"Top p\" at the same time.'
      type: float
      min: "0"
      max: "1"
      default_val:
        balance: "0.8"
        creative: "1"
        default_val: "1.0"
        precise: "0.3"
      precision: 1
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: max_tokens
      label:
        zh: 最大回复长度
        en: Response max length
      desc:
        zh: 控制模型输出的Tokens 长度上限。通常 100 Tokens 约等于 150 个中文汉字。
        en: You can specify the maximum length of the tokens output through this value. Typically, 100 tokens are approximately equal to 150 Chinese characters.
      type: int
      min: "1"
      max: "4096"
      default_val:
        default_val: "4096"
      options: []
      style:
        widget: slider
        label:
            zh: 输入及输出设置
            en: Input and output settings
    - name: top_p
      label:
        zh: Top P
        en: Top P
      desc:
        zh: '- **Top p 为累计概率**: 模型在生成输出时会从概率最高的词汇开始选择,直到这些词汇的总概率累积达到Top p 值。这样可以限制模型只选择这些高概率的词汇,从而控制输出内容的多样性。建议不要与“生成随机性”同时调整。'
        en: '**Top P**:\n\n- An alternative to sampling with temperature, where only tokens within the top p probability mass are considered. For example, 0.1 means only the top 10% probability mass tokens are considered.\n- We recommend altering this or temperature, but not both.'
      type: float
      min: "0"
      max: "1"
      default_val:
        default_val: "0.7"
      precision: 2
      options: []
      style:
        widget: slider
        label:
            zh: 生成多样性
            en: Generation diversity
    - name: response_format
      label:
        zh: 输出格式
        en: Response format
      desc:
        zh: '- **文本**: 使用普通文本格式回复\n- **JSON**: 将引导模型使用JSON格式输出'
        en: '**Response Format**:\n\n- **JSON**: Uses JSON format for replies'
      type: int
      min: ""
      max: ""
      default_val:
        default_val: "0"
      options:
        - label: Text
          value: "0"
        - label: JSON
          value: "2"
      style:
        widget: radio_buttons
        label:
            zh: 输入及输出设置
            en: Input and output settings
meta:
    name: Gemini-2.5-Flash
    protocol: gemini
    capability:
        function_call: true
        input_modal:
            - text
            - image
            - audio
            - video
        input_tokens: 1048576
        json_mode: true
        max_tokens: 1114112
        output_modal:
            - text
        output_tokens: 65536
        prefix_caching: true
        reasoning: true
        prefill_response: true
    conn_config:
        base_url: ""
        api_key: ""
        timeout: 0s
        model: gemini-2.5-flash
        temperature: 0.7
        frequency_penalty: 0
        presence_penalty: 0
        max_tokens: 4096
        top_p: 1
        top_k: 0
        stop: []
        openai: null
        claude: null
        ark: null
        deepseek: null
        qwen: null
        gemini:
            backend: 0
            project: ""
            location: ""
            api_version: ""
            headers:
                key_1:
                    - val_1
                    - val_2
            timeout_ms: 0
            include_thoughts: true
            thinking_budget: null
        custom: {}
    status: 0
Clone this wiki locally