[Bugfix] Fix benchmark_fused_collective crash on CustomOp init by mayank-ketkar-sf · Pull Request #34665 · vllm-project/vllm

mayank-ketkar-sf · 2026-02-17T02:00:11Z

Summary

Fix two bugs in benchmarks/kernels/benchmark_fused_collective.py:

Crash: VllmFusedAllreduce creates RMSNorm and QuantFP8 (CustomOp subclasses) which call dispatch_forward() → get_current_vllm_config() during __init__. Without an active VllmConfig context, the benchmark crashes immediately.
Incorrect dispatch: CustomOp binds its forward method (native PyTorch vs CUDA custom kernel) at init time. The original code created VllmFusedAllreduce once and reused it across different custom_ops configurations, meaning the native-vs-custom comparison was measuring the same code path.

Fix

Instantiate VllmFusedAllreduce inside each set_current_vllm_config() block so the forward dispatch matches the intended configuration.

Steps to Reproduce (crash)

# On any multi-GPU setup with vLLM installed:
torchrun --nproc_per_node=2 benchmarks/kernels/benchmark_fused_collective.py \
  --num-tokens 128 512 --hidden-dim 4096 --quant-modes none

Before fix: Crashes with AssertionError: Current vLLM config is not set
After fix: Benchmark runs, correctly dispatching native vs custom kernel per config

Test plan

torchrun --nproc_per_node=2 benchmarks/kernels/benchmark_fused_collective.py --num-tokens 128 512 --hidden-dim 4096 --quant-modes none completes without crash
Output includes distinct timings for _native_rms_norm vs _custom_rms_norm variants
No changes to production code — benchmark-only fix

github-actions · 2026-02-17T02:00:19Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request fixes a crash in the fused collective benchmark by wrapping the VllmFusedAllreduce instantiation in a VllmConfig context. The fix is correct in addressing the crash. However, I've identified a critical issue with the benchmark's logic that this change highlights. The CustomOp implementations are bound at initialization, but the benchmark reuses the same object across different configurations, leading to incorrect measurements. I've left a detailed comment explaining the issue and suggesting a path to correct the benchmark's logic.

benchmarks/kernels/benchmark_fused_collective.py

…dispatch VllmFusedAllreduce creates RMSNorm and QuantFP8 (both CustomOp subclasses) in __init__. CustomOp.__init__ calls dispatch_forward() which requires get_current_vllm_config(). Without a config context, the benchmark crashes with: AssertionError: Current vLLM config is not set. Additionally, CustomOp binds the forward method at init time based on the active config (native vs custom kernel). Creating the object once and reusing it across different config contexts meant the native-vs-custom comparison was incorrect. Fix: instantiate VllmFusedAllreduce inside each set_current_vllm_config block so the forward dispatch matches the intended configuration. Signed-off-by: Mayank Ketkar <mketkar@zoox.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Mayank Ketkar <mketkar@zoox.com>

mayank-ketkar-sf · 2026-02-17T02:24:52Z

Ran into this crash while benchmarking allreduce fusion on H200. Traced it to VllmFusedAllreduce being instantiated outside a VllmConfig context, and also noticed the CustomOp forward dispatch was only bound once regardless of the config switch. Updated to re-instantiate per config block.

cc @ilmarkov @mmangkad for review

mergify bot added performance Performance-related issues bug Something isn't working labels Feb 17, 2026

mayank-ketkar-sf force-pushed the fix-benchmark-fused-collective branch from 2a301af to 28f41f9 Compare February 17, 2026 02:01

gemini-code-assist bot reviewed Feb 17, 2026

View reviewed changes

benchmarks/kernels/benchmark_fused_collective.py Outdated Show resolved Hide resolved

mayank-ketkar-sf force-pushed the fix-benchmark-fused-collective branch from 28f41f9 to aa5fcf1 Compare February 17, 2026 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix benchmark_fused_collective crash on CustomOp init#34665

[Bugfix] Fix benchmark_fused_collective crash on CustomOp init#34665
mayank-ketkar-sf wants to merge 1 commit intovllm-project:mainfrom
mayank-ketkar-sf:fix-benchmark-fused-collective

mayank-ketkar-sf commented Feb 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mayank-ketkar-sf commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mayank-ketkar-sf commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Steps to Reproduce (crash)

Test plan

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mayank-ketkar-sf commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mayank-ketkar-sf commented Feb 17, 2026 •

edited

Loading