Skip to content

Conversation

Oct4Pie
Copy link
Contributor

@Oct4Pie Oct4Pie commented Aug 6, 2025

The 2 commits add the evaluation results for the GLM 4.5 model in both thinking and non-thinking modes.

The tests were done in diff editing format. There were many errors in tool output formatting in thinking mode (due to poor structured output adherence) but it should not affect the test results as those were retried but the costs reported are not accurate.

model settings used for non-thinking mode:

- name: openrouter/z-ai/glm-4.5
  extra_params:
    extra_body:
      reasoning:
        enabled: false
      provider:
        ignore:
          - novita
    max_tokens: 96000

for thinking mode, reasoning.enabled was set to true.

Just a note: provider novita was excluded from the openrouter providers for reproducibility as it did not explicitly mention the quantization of the model. All other providers used the fp8 variant.

The benchmark exercise folders are attached for reference

2025-08-03-11-33-59--glm-4.5-thinking-polyglot.zip
2025-08-03-13-07-25--glm-4.5-polyglot.zip

@kneelesh48
Copy link

Please use the official api as providers on openrouter often quantize the model.

@nuireprog
Copy link

It work on endpoint API openai

model: openai/glm-4.6

On redirige la requête vers le endpoint de z.ai

openai-api-base: "https://api.z.ai/api/coding/paas/v4"

REMPLACEZ CECI par votre vraie clé API obtenue sur https://z.ai/

openai-api-key: "YOURAPIKEY"

@Kreijstal
Copy link

what are the results for glm-4.6?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants