Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/main/presenter/configPresenter/modelDefaultSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1645,6 +1645,17 @@ export const defaultModelsSettings: DefaultModelSetting[] = [
functionCall: false,
reasoning: false
},
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
temperature: 0.7,
maxTokens: 8192,
contextLength: 65536,
match: ['glm-4.5v'],
vision: true,
functionCall: true,
reasoning: true
},
Comment on lines +1648 to +1658
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm GLM-4.5V capabilities (function calling, context window) and naming

Please double-check with Zhipu’s latest docs that:

  • Official model ID is exactly glm-4.5v
  • It supports native function/tool calling
  • Context window should be 65,536 (vs 128k like text-only 4.5 variants)

If any of these differ, update this block and provider-specific settings accordingly.

Optionally, verify codebase references to ensure no other places need updates:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -nC2 -i '\bglm[- ]?4\.5v\b|GLM-4\.5V'

Length of output: 2336


🌐 Web query:

What is the official model identifier for Zhipu's GLM-4.5V, its context window size, and does it support native function (tool) calling compatible with OpenAI "tools"?

💡 Result:

The official model identifier for Zhipu's GLM-4.5V is glm-4.5v[7]. Its context window size is up to 64K tokens for multimodal (i.e., both text and visual) inputs[2][6][9]. Current documentation and developer guides do not confirm official support for direct native function/tool calling in the way that OpenAI's "tools" (previously called "functions") works, nor do they state explicit compatibility with the OpenAI "tools" API schema.

Details and Supporting Information

  • Official model identifier:

    • As shown in Zhipu's Python SDK API example, the invocation looks like this:
      response = client.chat.completions.create(model="glm-4.5v", ...)
      This indicates the canonical model name for API access is glm-4.5v[7].
  • Context window size:

    • Zhipu GLM-4.5V supports a 64K token multimodal context window (i.e., it can process up to 64,000 tokens of combined text, image, or video per request)[2][6][9]. This is confirmed in technical summaries and product announcements.
  • Function (Tool) Calling Support:

    • There is no explicit mention in the available official documentation, technical reports, or SDK examples of a native implementation for function calling (i.e., OpenAI "tools" schema compatibility for structured API endpoints as actions, also called "functions")[7][3].
    • While GLM-4.5V is designed for strong multimodal and agent scenarios and can be integrated into GUI agent prompts and desktop tools, the APIs provided show only classic chat/inference patterns (prompt completion and image/video inputs)[2][3][7]. No evidence is given of a dedicated function calling interface or adherence to the OpenAI "tools" JSON schema for function calling.
    • Available developer demos show classic multimodal inference, visual grounding, and GUI agent tasks using prompt engineering, not a schema-augmented or structured function call format as standardized by OpenAI's "tools"[3].

Summary Table

Aspect GLM-4.5V Specification
Official Model Name glm-4.5v
Context Window 64,000 tokens (64K), multimodal
OpenAI "tools" Support Not officially documented as supported

If your use case requires structured function/tool calling as defined by OpenAI's "tools" schema (functions with input/output schemas handled natively by the model API), there is no direct evidence of support for this in GLM-4.5V as of August 2025. Integration would likely require custom prompt handling and downstream orchestration code on your part.

Citations:


Action: Set GLM-4.5V functionCall to false (ID & context verified)

Short summary: Verified the official model ID is glm-4.5v and the multimodal context window is 64K tokens (65536). There is no official documentation showing native function/tool-calling support compatible with OpenAI "tools", so the code should not mark this model as supporting function calls.

Files/locations to update:

  • src/main/presenter/configPresenter/modelDefaultSettings.ts — entry at ~lines 1649–1656: change functionCall: true
  • src/main/presenter/configPresenter/providerModelSettings.ts — entry at ~lines 311–318: change functionCall: true
  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — model entry around line ~143: verify and update if it marks function calling

Suggested change (replace the single line in each block):

  • from:
    functionCall: true,
  • to:
    functionCall: false, // no official native function/tool-calling documented for glm-4.5v

Keep vision: true and contextLength: 65536 as-is.

🤖 Prompt for AI Agents
In src/main/presenter/configPresenter/modelDefaultSettings.ts around lines
1648–1658, change the glm-4.5v model entry to set functionCall to false (it
currently is true) because there's no official native function/tool-calling
support; likewise update
src/main/presenter/configPresenter/providerModelSettings.ts around lines 311–318
to set functionCall: false for the glm-4.5v entry, and inspect
src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts near line
~143 to verify any glm-4.5v model entry does not mark functionCall true and
change it to false if present; keep vision: true and contextLength: 65536
unchanged and add a short inline comment like "// no official native
function/tool-calling documented for glm-4.5v" where you flip the flag.


// Fireworks AI模型配置
{
Expand Down
11 changes: 11 additions & 0 deletions src/main/presenter/configPresenter/providerModelSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,17 @@ export const providerModelSettings: Record<string, { models: ProviderModelSettin
zhipu: {
models: [
// GLM 4.5 系列模型
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
temperature: 0.7,
maxTokens: 8192,
contextLength: 65536,
match: ['glm-4.5v'],
vision: true,
functionCall: true,
reasoning: true
},
{
id: 'glm-4.5',
name: 'GLM-4.5',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,15 @@ export class ZhipuProvider extends OpenAICompatibleProvider {
maxTokens: 32000
},
// 多模态模型
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
group: 'zhipu',
providerId: this.provider.id,
isCustom: false,
contextLength: 65536,
maxTokens: 8192
},
Comment on lines +142 to +150
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity check: vision handling path works for GLM-4.5V

Since this model is multimodal, ensure the OpenAI-compatible code path properly formats image inputs for Zhipu (as it already does for glm-4v*). No code changes required here if the existing vision path is reused; just verify end-to-end.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Check for existing vision handling in Zhipu/OpenAI-compatible paths
rg -n -C3 -i 'glm-4v|image|vision' src/main/presenter/llmProviderPresenter/providers

Length of output: 121472


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
echo "=== File: $file ==="
nl -ba "$file" | sed -n '1,420p'

Length of output: 449


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
if [[ ! -f "$file" ]]; then
  echo "FILE_NOT_FOUND: $file"
  exit 0
fi
echo "=== Showing $file (first 400 lines) ==="
sed -n '1,400p' "$file"
echo "=== Searching for inheritance and vision/image handling within the file ==="
rg -n -C2 "class |extends|handleImg|image_url|image:|image_url|image" "$file" || true

Length of output: 6476


Action: mark GLM-4.5V as vision-enabled in zhipuProvider

OpenAICompatibleProvider already implements the image/vision handling path and ZhipuProvider inherits it, but the GLM-4.5V model entry in src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts is not flagged as vision:true — the upper layer uses that flag to decide whether to insert image_url. Add the vision flag so the multimodal path is used end-to-end (and do the same for other glm-4v* variants if they are multimodal).

Files to update:

  • src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts — add vision: true to the GLM-4.5V model object (and optionally glm-4v*, glm-4v-plus-0111, glm-4v-flash if they support vision).

Suggested diff:
@@
{
id: 'glm-4.5v',
name: 'GLM-4.5V',
group: 'zhipu',
providerId: this.provider.id,
isCustom: false,

  •    vision: true,
       contextLength: 65536,
       maxTokens: 8192
     },
    
🤖 Prompt for AI Agents
In src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts around
lines 142 to 150, the GLM-4.5V model entry is missing the vision flag so the
multimodal image path is not used; add vision: true to that model object (and
optionally add vision: true to other glm-4v* entries such as glm-4v,
glm-4v-plus-0111, glm-4v-flash if those models support vision) so the upper
layers will include image_url and route requests through the provider's vision
handling.

{
id: 'glm-4v-plus-0111',
name: 'GLM-4V-Plus-0111',
Expand Down