[Model] Support dp on ViT on GLM-4.5V #23168

david6666666 · 2025-08-19T08:28:01Z

Purpose

Add option to run GLM-4.5V vision encoder in data parallel manner while the main model is in TP. Can be enabled by flag: --mm-encoder-tp-mode "data"

FIX #23877

Test Plan

vllm serve zai-org/GLM-4.5V \
     --tensor-parallel-size 4 \
     --tool-call-parser glm45 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
     --allowed-local-media-path / \
     --media-io-kwargs '{"video": {"num_frames": -1}}'

vllm serve zai-org/GLM-4.5V \
     --tensor-parallel-size 4 \
     --tool-call-parser glm45 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
     --allowed-local-media-path / \
     --media-io-kwargs '{"video": {"num_frames": -1}}' \
     --mm-encoder-tp-mode "data"

Run GLM-4.5V accuracy test

mistral-eval(https://github.com/ywang96/mistral-evals/tree/modify)
python -m eval.run eval_vllm \
        --model_name zai-org/GLM-4.5V \
        --url http://0.0.0.0:8000 \
        --output_dir /glm4_5v \
        --eval_name "mmmu"
cd glm4_5v
python3 parse_result.py

TP4：

==================================
Total questions: 900
Correctly answered: 671
Accuracy: 74.56%
==================================

DP4：

==================================
Total questions: 900
Correctly answered: 674
Accuracy: 74.89%
==================================

Run benchmark on GLM-4.5V

python3 benchmarks/benchmark_serving.py  \
--backend openai-chat   \
--model zai-org/GLM-4.5V   \
--endpoint /v1/chat/completions   \
--dataset-name hf   \
--dataset-path lmarena-ai/VisionArena-Chat   \
--hf-split train   \
--num-prompts 1000 \
--max-concurrency 64

Test Result

TP4:

============ Serving Benchmark Result ============
Successful requests:                     1000      
Maximum request concurrency:             64        
Benchmark duration (s):                  113.64    
Total input tokens:                      90524     
Total generated tokens:                  127011    
Request throughput (req/s):              8.80      
Output token throughput (tok/s):         1117.61   
Total Token throughput (tok/s):          1914.17   
---------------Time to First Token----------------
Mean TTFT (ms):                          1318.74   
Median TTFT (ms):                        1289.14   
P99 TTFT (ms):                           3283.91   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          46.46     
Median TPOT (ms):                        45.21     
P99 TPOT (ms):                           63.08     
---------------Inter-token Latency----------------
Mean ITL (ms):                           47.39     
Median ITL (ms):                         26.94     
P99 ITL (ms):                            602.48    
==================================================

DP4：

============ Serving Benchmark Result ============
Successful requests:                     1000      
Maximum request concurrency:             64        
Benchmark duration (s):                  104.64    
Total input tokens:                      90524     
Total generated tokens:                  127136    
Request throughput (req/s):              9.56      
Output token throughput (tok/s):         1215.00   
Total Token throughput (tok/s):          2080.12   
---------------Time to First Token----------------
Mean TTFT (ms):                          1140.35   
Median TTFT (ms):                        984.99    
P99 TTFT (ms):                           5460.47   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          43.27     
Median TPOT (ms):                        43.30     
P99 TPOT (ms):                           56.11     
---------------Inter-token Latency----------------
Mean ITL (ms):                           43.88     
Median ITL (ms):                         25.77     
P99 ITL (ms):                            486.12    
==================================================

single Req:
text:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4.5V",
    "messages": [
      {
        "role": "user",
        "content": "What is the result of 111 * 5?"
      }
    ],
    "max_tokens": 500,
    "temperature": 0.7
  }'

{"id":"chatcmpl-5ecb8dc8dbab4c0f9cbf78b59bb921fe","object":"chat.completion","created":1756366966,"model":"zai-org/GLM-4.5V","choices":[{"index":0,"message":{"role":"assistant","content":"\nTo calculate \\(111 \\times 5\\), you can break it down using the distributive property:  \n\\[\n111 \\times 5 = (100 + 10 + 1) \\times 5 = 100 \\times 5 + 10 \\times 5 + 1 \\times 5 = 500 + 50 + 5 = 555.\n\\]  \nAlternatively, multiplying digit by digit:  \n- \\(1 \\times 5 = 5\\) (units place),  \n- \\(1 \\times 5 = 5\\) (tens place),  \n- \\(1 \\times 5 = 5\\) (hundreds place","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":"I need to calculate 111 multiplied by 5. Let me think about how to do this step by step. First, I know that multiplying by 5 is the same as multiplying by 10 and then dividing by 2, but maybe a simpler way is to just do the multiplication directly.\n\nSo, 111 times 5. I can break it down: 100 times 5 is 500, 10 times 5 is 50, and 1 times 5 is 5. Then add those together: 500 + 50 is 550, plus 5 is 555. That seems right.\n\nAlternatively, I can think of it as 111 * 5. Let's do the multiplication digit by digit. Starting from the right: 1 * 5 is 5. Then the next digit is 1, so 1 * 5 is 5, and the last digit is 1, so 1 * 5 is 5. So putting it together, it's 555. Yeah, that matches what I got before.\n\nI could also use the distributive property: 111 * 5 = (100 + 10 + 1) * 5 = 100*5 + 10*5 + 1*5 = 500 + 50 + 5 = 555. Same result.\n\nI think that's correct. Let me just verify with another method. If I add 111 five times: 111 + 111 is 222, plus another 111 is 333, plus another 111 is 444, plus the last 111 is 555. Yep, that works too.\n\nSo all methods lead to 555. I'm confident that's the answer."},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":17,"total_tokens":517,"completion_tokens":500,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

image:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4.5V",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What do you see in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

{"id":"chatcmpl-1eb19d594b184209b0b287992ee35dd1","object":"chat.completion","created":1756366983,"model":"zai-org/GLM-4.5V","choices":[{"index":0,"message":{"role":"assistant","content":"\nIn the image, there is a duck swimming on a body of water. The duck has a vibrant green head, a bright yellow beak, and its body features a mix of brown, white, and gray feathers. The water is a deep blue with gentle ripples, and the duck’s reflection is visible on the surface. The overall scene captures the duck in a natural aquatic environment.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":"Got it, let's see. The image shows a duck swimming in water. First, I need to describe the duck's features. The duck has a green head, a yellow beak, and its body is a mix of brown, white, and maybe some other colors. The water is blue with ripples, and there's a reflection of the duck in the water. So I should mention the duck's appearance, the water, and the reflection. Let me structure that."},"logprobs":null,"finish_reason":"stop","stop_reason":151336,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":6092,"total_tokens":6271,"completion_tokens":179,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4.5V",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What activities are happening in this video?"
          },
          {
            "type": "video_url",
            "video_url": {
              "url": "https://content.pexels.com/videos/free-videos.mp4"
            }
          }
        ]
      }
    ],
    "max_tokens": 500
  }'

{"id":"chatcmpl-1bf2c80ca6234144912915fc28d528cb","object":"chat.completion","created":1756367006,"model":"zai-org/GLM-4.5V","choices":[{"index":0,"message":{"role":"assistant","content":"The video shows several distinct activities across different scenes:\n\n1.  **Ocean Waves Crashing on Rocks:** The video begins with aerial footage of turquoise ocean waves crashing against dark, jagged rocks along a coastline.\n2.  **Driving on a Winding Road:** It then shows an aerial view of cars driving on a winding road that cuts through lush, green, terraced fields, likely tea plantations.\n3.  **Using a Smartphone:** A close-up shot shows a person holding and using a red smartphone.\n4.  **Driving Through a Desert:** The video includes aerial footage of vehicles driving on a paved road through a vast, sandy desert landscape.\n5.  **Playing Basketball:** The final scenes feature a person in a basketball jersey on an outdoor court, holding a basketball and appearing to be in the middle of a game or practice.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":151336,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":29728,"total_tokens":29900,"completion_tokens":172,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

github-actions · 2025-08-19T08:29:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-08-19T14:08:11Z

We are in the process of merging #22742 which is quite similar in idea to this, except that DP is applied on the whole vision encoder. Perhaps you could adapt your PR to use the helper functions from that PR.

david6666666 · 2025-08-20T00:57:05Z

We are in the process of merging #22742 which is quite similar in idea to this, except that DP is applied on the whole vision encoder. Perhaps you could adapt your PR to use the helper functions from that PR.

Thank you.I will adapt my PR later

dosubot · 2025-08-20T02:26:55Z

Related Documentation

No published documentation to review for changes on this repository.
Write your first living document

^{How did I do? Any feedback?}

DarkLight1337 · 2025-08-20T02:55:59Z

The code looks good, I'll merge the PR in once you have posted the benchmark results. (Please ping me if you have done this!)

vllm/model_executor/models/glm4_1v.py

david6666666 · 2025-08-22T03:21:23Z

We are in the process of merging #22742 which is quite similar in idea to this, except that DP is applied on the whole vision encoder. Perhaps you could adapt your PR to use the helper functions from that PR.

run_dp_sharded_mrope_vision_model does not match GLM-4.5V.
ViT's grid_thw type requires some time for adaptation and verification.

DarkLight1337 · 2025-08-28T08:31:03Z

vllm/multimodal/utils.py

I think it might be actually faster to use Python built-ins here instead of torch.Tensor because the length of grid_thw_list is pretty small. But I guess you should profile this

Glm4vVisionTransformer
def forward(
self,
x: torch.Tensor,
grid_thw: torch.Tensor,
) -> torch.Tensor:
so I directly process it with tensor.Tensor without converting it

Can you try calling .tolist() before passing it into this method and see if it improves the performance?

ok, I will try.

Can you try calling .tolist() before passing it into this method and see if it improves the performance?

done

Signed-off-by: David Chen <530634352@qq.com>

DarkLight1337 · 2025-09-02T07:32:37Z

vllm/model_executor/models/glm4_1v.py

+            # Split concatenated embeddings for each video item.
+            merge_size = self.visual.spatial_merge_size
+            sizes = grid_thw.prod(-1) // merge_size // merge_size
+            return video_embeds.split(sizes.tolist())


This part can be factored out to be more similar to the original code (since the branch with use_data_parallel returns early anyway)

Signed-off-by: David Chen <530634352@qq.com>

vllm/model_executor/models/glm4_1v.py

Signed-off-by: David Chen <530634352@qq.com>

DarkLight1337

LGTM, do you have further changes to make?

david6666666 · 2025-09-02T08:15:16Z

LGTM, do you have further changes to make?

No, that's it.

Signed-off-by: David Chen <530634352@qq.com>

This was referenced Aug 18, 2025

[Feature]: Generalized the DP feature for ViT and multimodal backbone for the benefit of all models #22743

Closed

[CLI][Doc] Formalize --mm-encoder-tp-mode #23190

Merged

david6666666 force-pushed the dp_vit_glm45v branch 3 times, most recently from e06086f to fbc3bf3 Compare August 20, 2025 02:19

david6666666 marked this pull request as ready for review August 20, 2025 02:26

david6666666 force-pushed the dp_vit_glm45v branch 3 times, most recently from 4fab348 to daf1936 Compare August 20, 2025 07:21

DarkLight1337 reviewed Aug 20, 2025

View reviewed changes

vllm/model_executor/models/glm4_1v.py Outdated Show resolved Hide resolved

david6666666 force-pushed the dp_vit_glm45v branch 2 times, most recently from b76be08 to b87b83b Compare August 21, 2025 08:00

david6666666 force-pushed the dp_vit_glm45v branch from b87b83b to 8c4c7e0 Compare August 22, 2025 07:56

david6666666 requested a review from ywang96 as a code owner August 22, 2025 07:56

mergify bot added the multi-modality Related to multi-modality (#4194) label Aug 22, 2025

david6666666 changed the title ~~[Model] Support dp on ViT on GLM-4.5V~~ [WIP][Model] Support dp on ViT on GLM-4.5V Aug 22, 2025

david6666666 force-pushed the dp_vit_glm45v branch from 8c4c7e0 to b2f685a Compare August 22, 2025 08:12

david6666666 requested a review from hmellor as a code owner August 22, 2025 08:12

mergify bot added the documentation Improvements or additions to documentation label Aug 22, 2025

david6666666 mentioned this pull request Aug 25, 2025

[Bug]: GLM-4.5V outputs repeat infinitely when n > 1 #23438

Closed

1 task

david6666666 force-pushed the dp_vit_glm45v branch from b2f685a to 91fd1bf Compare August 28, 2025 06:28

david6666666 changed the title ~~[WIP][Model] Support dp on ViT on GLM-4.5V~~ [Model] Support dp on ViT on GLM-4.5V Aug 28, 2025

david6666666 changed the title ~~[Model] Support dp on ViT on GLM-4.5V~~ [WIP][Model] Support dp on ViT on GLM-4.5V Aug 28, 2025

DarkLight1337 reviewed Aug 28, 2025

View reviewed changes

david6666666 changed the title ~~[WIP][Model] Support dp on ViT on GLM-4.5V~~ [Model] Support dp on ViT on GLM-4.5V Aug 29, 2025

david6666666 added 2 commits September 2, 2025 10:21

support dp on ViT on GLM-4.5V

7d5930e

Signed-off-by: David Chen <530634352@qq.com>

rebase code

a898fd0

Signed-off-by: David Chen <530634352@qq.com>

david6666666 changed the title ~~[Model] Support dp on ViT on GLM-4.5V~~ [WIP][Model] Support dp on ViT on GLM-4.5V Sep 2, 2025

david6666666 force-pushed the dp_vit_glm45v branch from 7df7e31 to a898fd0 Compare September 2, 2025 02:49

david6666666 added 2 commits September 2, 2025 11:31

fix pre-commit

45918a7

Signed-off-by: David Chen <530634352@qq.com>

fix pre-commit

fedd64f

Signed-off-by: David Chen <530634352@qq.com>

DarkLight1337 reviewed Sep 2, 2025

View reviewed changes

fix code

1ce87d9

Signed-off-by: David Chen <530634352@qq.com>

david6666666 changed the title ~~[WIP][Model] Support dp on ViT on GLM-4.5V~~ [Model] Support dp on ViT on GLM-4.5V Sep 2, 2025

fix code

43626c4

Signed-off-by: David Chen <530634352@qq.com>

DarkLight1337 reviewed Sep 2, 2025

View reviewed changes

vllm/model_executor/models/glm4_1v.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 2, 2025

View reviewed changes

vllm/model_executor/models/glm4_1v.py Show resolved Hide resolved

fix code

c8cc222

Signed-off-by: David Chen <530634352@qq.com>

DarkLight1337 approved these changes Sep 2, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 2, 2025 08:44

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025

DarkLight1337 merged commit 2f0bab3 into vllm-project:main Sep 2, 2025
50 checks passed

666even666 mentioned this pull request Sep 3, 2025

[Model] enable data parallel for InternVL vision encoder #23909

Merged

5 tasks

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Model] Support dp on ViT on GLM-4.5V (vllm-project#23168)

39ff383

Signed-off-by: David Chen <530634352@qq.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Support dp on ViT on GLM-4.5V (vllm-project#23168)

b8264e2

Signed-off-by: David Chen <530634352@qq.com>

Uh oh!

Uh oh!

[Model] Support dp on ViT on GLM-4.5V #23168

[Model] Support dp on ViT on GLM-4.5V #23168

Conversation

david6666666 commented Aug 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

DarkLight1337 commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david6666666 commented Aug 20, 2025

Uh oh!

dosubot bot commented Aug 20, 2025

Uh oh!

DarkLight1337 commented Aug 20, 2025

Uh oh!

Uh oh!

david6666666 commented Aug 22, 2025

Uh oh!

DarkLight1337 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

david6666666 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

david6666666 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

david6666666 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

david6666666 commented Aug 19, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Aug 19, 2025 •

edited

Loading