[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

vanbasten23 · 2025-04-28T23:47:02Z

This PR enables gemma3-27b with TP>1 on multi-chips. Without the change, it fails with an error:

callstack:

Traceback (most recent call last):
  File "/home/xiowei/vllm/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
    output = func(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/v1/worker/tpu_worker.py", line 160, in determine_available_memory
    self.model_runner.profile_run(self.model_runner.max_num_tokens)
  File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1166, in profile_run
    dummy_encoder_outputs = self.model.get_multimodal_embeddings(
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 588, in get_multimodal_embeddings
    return self._process_image_input(image_input)
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 569, in _process_image_input
    image_features = self._image_pixels_to_features(
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 557, in _image_pixels_to_features
    image_features = vision_tower(pixel_values.to(dtype=target_dtype))
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 477, in forward
    return self.vision_model(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 419, in forward
    hidden_states = self.embeddings(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 135, in forward
    embeddings = embeddings + self.position_embedding(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
    masked_input, input_mask = get_masked_input_and_mask(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 671, in _fn
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 768, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 753, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1357, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1246, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2201, in compile_to_module
    return self._compile_to_module()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2209, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2140, in codegen
    self.init_wrapper_code()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1898, in init_wrapper_code
    self.device_ops = get_device_op_overrides(self.device_type)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/codegen/common.py", line 490, in get_device_op_overrides
    return device_op_overrides_dict[device]
torch._inductor.exc.InductorError: KeyError: 'xla'

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Test plan: pytest -s -vv tests/v1/tpu/test_basic.py -k test_gemma3_27b_with_text_input_and_tp 2>&1 | tee ~/out.txt

github-actions · 2025-04-28T23:47:10Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vanbasten23 · 2025-04-29T00:09:06Z

cc: @bvrockwell @yarongmu-google

tests/v1/tpu/test_basic.py

yaochengji · 2025-04-29T03:44:25Z

tests/v1/tpu/test_basic.py

+
+@pytest.mark.skipif(not current_platform.is_tpu(),
+                    reason="This is a basic test for TPU only")
+def test_gemma3_with_mm_on_multichip(


The test name indicates there's MM, could you help point out where's it?

gemma-3-27b-it, unlike gemma-3-1b-it, is a multimodal model (is_multimodal_model=True). Even if the input is text, "always use embeddings (rather than token ids) as input to the multimodal model, even when the input is text" per the comment. That's where the mm comes from.

I can see your concern. Let me make the test name clearer.

yaochengji

LGTM, thanks Xiongfei for supporting such an important model!

tests/v1/tpu/test_basic.py

vanbasten23 · 2025-05-01T13:46:26Z

Somehow, I still couldn't see my TPU CI running (Is it because all the tests are run in sequence and a CI before the TPU CI gets stuck and blocks the TPU CI?) nor could I start the TPU CI myself (The "Run TPU V1 Tests" button is gray.)

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

vanbasten23 · 2025-05-02T22:23:09Z

The failing CI seems to have the symptom of timeout. I don't see why my PR would cause that.

mgoin · 2025-05-02T22:25:30Z

I retried the failing tests, but I think we can merge ignoring those timeouts

vanbasten23 · 2025-05-05T05:04:51Z

Thanks @mgoin . I also did some check on my a100 VM. For the 2 failing tests:

VLLM_USE_V1=1 pytest -s -vv tests/mq_llm_engine/test_error_handling.py::test_mp_crash_detection: it fails on the main branch (4c33d67)
VLLM_USE_V1=1 pytest -s -vv tests/v1/engine/test_engine_core_client.py -k test_startup_failure: it succeeds on my branch xiowei/gemma3-27b-multi-chip

Could you help merge the PR? Thanks!

vllm/platforms/interface.py

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

NickLucche

Looks cleaner, thanks!

mgoin · 2025-05-05T21:13:40Z

Nice improvement and TPU V1 test is green!

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: minpeter <kali2005611@gmail.com>

mergify bot added v1 tpu Related to Google TPUs labels Apr 28, 2025

vanbasten23 requested review from yaochengji and mgoin April 29, 2025 00:09

vanbasten23 commented Apr 29, 2025

View reviewed changes

tests/v1/tpu/test_basic.py Outdated Show resolved Hide resolved

vanbasten23 marked this pull request as ready for review April 29, 2025 00:10

yaochengji reviewed Apr 29, 2025

View reviewed changes

vanbasten23 requested a review from yaochengji April 29, 2025 06:14

saltysoup mentioned this pull request Apr 29, 2025

[Bug]: Add TPU support for gemma-3-4b-it and gemma-3-27b-it #16521

Open

1 task

vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from d4e2ce2 to 81fbdc9 Compare April 29, 2025 22:38

yaochengji approved these changes Apr 30, 2025

View reviewed changes

vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from 81fbdc9 to 282517c Compare May 1, 2025 00:14

mgoin approved these changes May 1, 2025

View reviewed changes

tests/v1/tpu/test_basic.py Show resolved Hide resolved

vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from c638a28 to 7bce06f Compare May 1, 2025 16:59

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2025

vanbasten23 added 7 commits May 2, 2025 16:49

Make gemma work on tpu.

231074a

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

formalized the test

bdc4bb9

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

linter

0b2dccc

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

remove comments

505d3cd

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix comments

f0d6d7a

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix comment

b7c04f7

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix pytest.mark.skipif

22c0481

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from 5bdec58 to 22c0481 Compare May 2, 2025 16:49

vanbasten23 added 2 commits May 2, 2025 20:33

fix ci

f2f12b5

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

fix linter

a773538

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

NickLucche suggested changes May 5, 2025

View reviewed changes

vllm/platforms/interface.py Outdated Show resolved Hide resolved

overwrite simple_compile_backend

d168cfa

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

NickLucche approved these changes May 5, 2025

View reviewed changes

simon-mo merged commit 9765940 into vllm-project:main May 5, 2025
46 of 49 checks passed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[TPU] Enable gemma3-27b with TP>1 on multi-chips. (vllm-project#17335)

642d24b

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[TPU] Enable gemma3-27b with TP>1 on multi-chips. (vllm-project#17335)

8375309

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[TPU] Enable gemma3-27b with TP>1 on multi-chips. (vllm-project#17335)

a31b1a4

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[TPU] Enable gemma3-27b with TP>1 on multi-chips. (vllm-project#17335)

2dca5dd

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: minpeter <kali2005611@gmail.com>

tanujtiwari1998 mentioned this pull request Jul 8, 2025

cached tokens completions character-tech/vllm#22

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

Uh oh!

vanbasten23 commented Apr 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

vanbasten23 commented Apr 29, 2025

Uh oh!

Uh oh!

yaochengji Apr 29, 2025

Uh oh!

vanbasten23 Apr 29, 2025

Uh oh!

vanbasten23 Apr 29, 2025

Uh oh!

yaochengji left a comment

Uh oh!

Uh oh!

vanbasten23 commented May 1, 2025

Uh oh!

vanbasten23 commented May 2, 2025

Uh oh!

mgoin commented May 2, 2025

Uh oh!

vanbasten23 commented May 5, 2025

Uh oh!

Uh oh!

NickLucche left a comment

Uh oh!

mgoin commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

Uh oh!

Conversation

vanbasten23 commented Apr 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

vanbasten23 commented Apr 29, 2025

Uh oh!

Uh oh!

yaochengji Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vanbasten23 commented May 1, 2025

Uh oh!

vanbasten23 commented May 2, 2025

Uh oh!

mgoin commented May 2, 2025

Uh oh!

vanbasten23 commented May 5, 2025

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

vanbasten23 commented Apr 28, 2025 •

edited by github-actions bot

Loading