Skip to content

[TPU] Enable gemma3-27b with TP>1 on multi-chips. #17335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 5, 2025

Conversation

vanbasten23
Copy link
Collaborator

@vanbasten23 vanbasten23 commented Apr 28, 2025

This PR enables gemma3-27b with TP>1 on multi-chips. Without the change, it fails with an error:

callstack:

Traceback (most recent call last):
  File "/home/xiowei/vllm/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
    output = func(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/v1/worker/tpu_worker.py", line 160, in determine_available_memory
    self.model_runner.profile_run(self.model_runner.max_num_tokens)
  File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1166, in profile_run
    dummy_encoder_outputs = self.model.get_multimodal_embeddings(
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 588, in get_multimodal_embeddings
    return self._process_image_input(image_input)
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 569, in _process_image_input
    image_features = self._image_pixels_to_features(
  File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 557, in _image_pixels_to_features
    image_features = vision_tower(pixel_values.to(dtype=target_dtype))
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 477, in forward
    return self.vision_model(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 419, in forward
    hidden_states = self.embeddings(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 135, in forward
    embeddings = embeddings + self.position_embedding(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xiowei/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
    masked_input, input_mask = get_masked_input_and_mask(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 671, in _fn
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 768, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 753, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1357, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1246, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2201, in compile_to_module
    return self._compile_to_module()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2209, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2140, in codegen
    self.init_wrapper_code()
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1898, in init_wrapper_code
    self.device_ops = get_device_op_overrides(self.device_type)
  File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/codegen/common.py", line 490, in get_device_op_overrides
    return device_op_overrides_dict[device]
torch._inductor.exc.InductorError: KeyError: 'xla'

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Test plan: pytest -s -vv tests/v1/tpu/test_basic.py -k test_gemma3_27b_with_text_input_and_tp 2>&1 | tee ~/out.txt

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added v1 tpu Related to Google TPUs labels Apr 28, 2025
@vanbasten23
Copy link
Collaborator Author

cc: @bvrockwell @yarongmu-google

@vanbasten23 vanbasten23 requested review from yaochengji and mgoin April 29, 2025 00:09
@vanbasten23 vanbasten23 marked this pull request as ready for review April 29, 2025 00:10

@pytest.mark.skipif(not current_platform.is_tpu(),
reason="This is a basic test for TPU only")
def test_gemma3_with_mm_on_multichip(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name indicates there's MM, could you help point out where's it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gemma-3-27b-it, unlike gemma-3-1b-it, is a multimodal model (is_multimodal_model=True). Even if the input is text, "always use embeddings (rather than token ids) as input to the multimodal model, even when the input is text" per the comment. That's where the mm comes from.

I can see your concern. Let me make the test name clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Xiongfei for supporting such an important model!

@vanbasten23 vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from 81fbdc9 to 282517c Compare May 1, 2025 00:14
@vanbasten23
Copy link
Collaborator Author

Somehow, I still couldn't see my TPU CI running (Is it because all the tests are run in sequence and a CI before the TPU CI gets stuck and blocks the TPU CI?) nor could I start the TPU CI myself (The "Run TPU V1 Tests" button is gray.)
image

@vanbasten23 vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from c638a28 to 7bce06f Compare May 1, 2025 16:59
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2025
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
@vanbasten23 vanbasten23 force-pushed the xiowei/gemma3-27b-multi-chip branch from 5bdec58 to 22c0481 Compare May 2, 2025 16:49
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
@vanbasten23
Copy link
Collaborator Author

The failing CI seems to have the symptom of timeout. I don't see why my PR would cause that.

@mgoin
Copy link
Member

mgoin commented May 2, 2025

I retried the failing tests, but I think we can merge ignoring those timeouts

@vanbasten23
Copy link
Collaborator Author

Thanks @mgoin . I also did some check on my a100 VM. For the 2 failing tests:

  • VLLM_USE_V1=1 pytest -s -vv tests/mq_llm_engine/test_error_handling.py::test_mp_crash_detection: it fails on the main branch (4c33d67)
  • VLLM_USE_V1=1 pytest -s -vv tests/v1/engine/test_engine_core_client.py -k test_startup_failure: it succeeds on my branch xiowei/gemma3-27b-multi-chip

Could you help merge the PR? Thanks!

Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cleaner, thanks!

@mgoin
Copy link
Member

mgoin commented May 5, 2025

Nice improvement and TPU V1 test is green!

@simon-mo simon-mo merged commit 9765940 into vllm-project:main May 5, 2025
46 of 49 checks passed
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants