-
Notifications
You must be signed in to change notification settings - Fork 625
feat: add GLM-4.5V model support #739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -139,6 +139,15 @@ export class ZhipuProvider extends OpenAICompatibleProvider { | |
| maxTokens: 32000 | ||
| }, | ||
| // 多模态模型 | ||
| { | ||
| id: 'glm-4.5v', | ||
| name: 'GLM-4.5V', | ||
| group: 'zhipu', | ||
| providerId: this.provider.id, | ||
| isCustom: false, | ||
| contextLength: 65536, | ||
| maxTokens: 8192 | ||
| }, | ||
|
Comment on lines
+142
to
+150
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainSanity check: vision handling path works for GLM-4.5V Since this model is multimodal, ensure the OpenAI-compatible code path properly formats image inputs for Zhipu (as it already does for 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Check for existing vision handling in Zhipu/OpenAI-compatible paths
rg -n -C3 -i 'glm-4v|image|vision' src/main/presenter/llmProviderPresenter/providersLength of output: 121472 🏁 Script executed: #!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
echo "=== File: $file ==="
nl -ba "$file" | sed -n '1,420p'Length of output: 449 🏁 Script executed: #!/bin/bash
set -euo pipefail
file="src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts"
if [[ ! -f "$file" ]]; then
echo "FILE_NOT_FOUND: $file"
exit 0
fi
echo "=== Showing $file (first 400 lines) ==="
sed -n '1,400p' "$file"
echo "=== Searching for inheritance and vision/image handling within the file ==="
rg -n -C2 "class |extends|handleImg|image_url|image:|image_url|image" "$file" || trueLength of output: 6476 Action: mark GLM-4.5V as vision-enabled in zhipuProvider OpenAICompatibleProvider already implements the image/vision handling path and ZhipuProvider inherits it, but the GLM-4.5V model entry in src/main/presenter/llmProviderPresenter/providers/zhipuProvider.ts is not flagged as vision:true — the upper layer uses that flag to decide whether to insert image_url. Add the vision flag so the multimodal path is used end-to-end (and do the same for other glm-4v* variants if they are multimodal). Files to update:
Suggested diff:
🤖 Prompt for AI Agents |
||
| { | ||
| id: 'glm-4v-plus-0111', | ||
| name: 'GLM-4V-Plus-0111', | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm GLM-4.5V capabilities (function calling, context window) and naming
Please double-check with Zhipu’s latest docs that:
glm-4.5vIf any of these differ, update this block and provider-specific settings accordingly.
Optionally, verify codebase references to ensure no other places need updates:
🏁 Script executed:
Length of output: 2336
🌐 Web query:
💡 Result:
The official model identifier for Zhipu's GLM-4.5V is glm-4.5v[7]. Its context window size is up to 64K tokens for multimodal (i.e., both text and visual) inputs[2][6][9]. Current documentation and developer guides do not confirm official support for direct native function/tool calling in the way that OpenAI's "tools" (previously called "functions") works, nor do they state explicit compatibility with the OpenAI "tools" API schema.
Details and Supporting Information
Official model identifier:
Context window size:
Function (Tool) Calling Support:
Summary Table
If your use case requires structured function/tool calling as defined by OpenAI's "tools" schema (functions with input/output schemas handled natively by the model API), there is no direct evidence of support for this in GLM-4.5V as of August 2025. Integration would likely require custom prompt handling and downstream orchestration code on your part.
Citations:
Action: Set GLM-4.5V functionCall to false (ID & context verified)
Short summary: Verified the official model ID is
glm-4.5vand the multimodal context window is 64K tokens (65536). There is no official documentation showing native function/tool-calling support compatible with OpenAI "tools", so the code should not mark this model as supporting function calls.Files/locations to update:
functionCall: truefunctionCall: trueSuggested change (replace the single line in each block):
functionCall: true,
functionCall: false, // no official native function/tool-calling documented for glm-4.5v
Keep
vision: trueandcontextLength: 65536as-is.🤖 Prompt for AI Agents