[Misc] Moved override for allreduce fusion thresholds from env var to config #23722

nvjullin · 2025-08-27T08:32:31Z

Purpose

Follow up on #23639.
Also cleaned up two competing/conflicting ways of tuning thresholds: number of tokens vs size.
Size is the relevant parameter (for perf), so we should only use that.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Lin <jullin@nvidia.com>

gemini-code-assist

Code Review

This pull request successfully refactors the configuration for allreduce fusion thresholds, moving them from an environment variable to a more structured configuration object. The cleanup of the logic for tuning thresholds is also a welcome improvement. I've found one potential performance issue in the calculation of max_token_num which appears to be overly conservative and could prevent the fused kernel from being used in some cases where it would be beneficial. Please see the detailed comment.

vllm/compilation/collective_fusion.py

Signed-off-by: Julien Lin <jullin@nvidia.com>

ilmarkov · 2025-08-27T13:29:11Z

I am changing the constants and a bit of logic in the other PR. But keeping the max size and cleaning the other tuning ways make sense to me.
LGTM.

ProExpertProg

I think if we could restructure this such that the defaults are also reflected in config that would be nice. So maybe config asks the pass for. defaults but uses CLI values with precedence.

hmellor · 2025-08-27T16:17:06Z

I agree it would be nice if the the defaults could be the default of the actual config field rather than living with the implementation

nvjullin · 2025-08-28T05:53:45Z

I agree it would be nice if the the defaults could be the default of the actual config field

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, "4": 1, "6": 1, "8": 8}. This is quite bad UI.

I think if we could restructure this such that the defaults are also reflected in config that would be nice.

Right now, the comment explains the defaults, so it is indeed reflected in the config. The issue is that the comment has to be in sync with the implementation. It's not ideal, but otherwise we'll have to write a new dict-like class to handle the aforementioned UI problem which I think is overkill for a very niche config option.

Another option is to have a default of {"2": 64, "4": 1, "6": 1, "8": 1} in config and fall back to the one in flashinfer_max_size when the config is empty. This is essentially the same as the current situation where we have a comment explaining the default: we still have to keep them in sync.

ilmarkov · 2025-08-28T09:00:23Z

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, >"4": 1, "6": 1, "8": 8}

@nvjullin I'd suggest to update the default config with user-provided dictionary. I believe user usually needs to specify one key:value pair at the initialization to update the default config.

hmellor · 2025-09-01T14:01:37Z

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, "4": 1, "6": 1, "8": 8}. This is quite bad UI.

I don't agree. If I, as a user, pass a dict as an argument I would expect my dict to be used, not for it to be merged into another dict that I don't know about.

nvjullin · 2025-09-02T04:48:46Z

If I, as a user, pass a dict as an argument I would expect my dict to be used, not for it to be merged into another dict that I don't know about.

you do know about it, it's documented
they're defaults, as in, if you don't specify something, we fall back on the documented values

The alternative is to not have defaults and error out. Suppose the user passes

--tensor-parallel 2 --fi_allreduce_fusion_max_size_mb {"4": 4}

Do you want the behavior to be

Error out: we can use a default dict argument and only that
Fallback on 0.5: we can use a default dict argument and still have to document the 0.5 used in max_sizes.get(world_size, MiB // 2) that isn't visible in the arguments
Fallback on 64: you can't just have a default dict argument. This would've been the default had I not passed --fi_allreduce_fusion_max_size_mb {"4": 4}, which is what I'd expect because I didn't say anything about world size 2.

hmellor · 2025-09-02T12:33:10Z

I think option 2? So that the default would be:

fi_allreduce_fusion_max_size_mb: dict[int, float] = field(default_factory=lambda _: {2: 64 * MiB, 4: MiB})

And we can document that any unspecified world sizes will default to 0.5MiB?

@ProExpertProg what do you think?

ilmarkov · 2025-09-02T14:20:06Z

@hmellor

The defaults must be different depending on device.
I believe we can just ignore (don't enable fusion) all world size cases except 2,4,8,16 as flashinfer only supports these.

hmellor · 2025-09-02T14:29:59Z

Thanks both for the additional context.

My question now is, do we need to expose this to the user at all? If it:

depends on world size and device type
is only updating rather than replacing the default

What is the use case for setting different values?

ilmarkov · 2025-09-02T14:41:52Z

What is the use case for setting different values?

I think this config was originally intended to be used for simplicity of testing and benchmarking

ProExpertProg · 2025-09-02T15:45:25Z

@hmellor I think merging dict is not that unintuitive because we already do it in other places. And like Ilia mentioned this is a developer-user switch and not as much of a user-user switch.

I think the default should be an empty dict and it respects the user-passed values first with a fallback to the defaults, which are dependent on the device. We can document this behavior well in the field docstring. The defaults are set into config in __post_init__ (either global or CompilationConfig/PassConfig for easier visibility and we can make it so that these sizes are always logged if we want the visibility of what the defaults are.

A user will also only run with a single value of TP for a single config struct so the behavior for unspecified world sizes does not matter as much.

hmellor · 2025-09-02T15:56:54Z

Ok, that works for me! I just wanted to explore the possibility of bringing the default higher up. Let's leave it as it is.

nvjullin · 2025-09-05T09:28:22Z

I may be misunderstanding something, but is the current state of the PR fine? Or do we want to change it to something else?

ProExpertProg · 2025-09-05T20:16:25Z

@nvjullin instead of merging a config dict with the default dict when the value is used, I would still merge that during config init so that the defaults appear in the config when it's printed.

Signed-off-by: Julien Lin <jullin@nvidia.com>

mergify · 2025-09-08T08:13:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @nvjullin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Julien Lin <jullin@nvidia.com>

vllm/config/compilation.py

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: nvjullin <jullin@nvidia.com>

moved env var to config

7c14004

Signed-off-by: Julien Lin <jullin@nvidia.com>

nvjullin requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and zou3519 as code owners August 27, 2025 08:32

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/compilation/collective_fusion.py Show resolved Hide resolved

nvjullin added 2 commits August 27, 2025 17:25

Merge branch 'main' into ar-config

5db1b04

cap at max_num_batched_tokens

52f7e0c

Signed-off-by: Julien Lin <jullin@nvidia.com>

ProExpertProg reviewed Aug 27, 2025

View reviewed changes

ProExpertProg added the torch.compile label Aug 28, 2025

github-project-automation bot added this to torch.compile integration Aug 28, 2025

github-project-automation bot moved this to To triage in torch.compile integration Aug 28, 2025

nvjullin added 2 commits September 8, 2025 08:10

moved defaults to post_init

f471391

Signed-off-by: Julien Lin <jullin@nvidia.com>

Merge branch 'main' into ar-config

9347466

mergify bot added the needs-rebase label Sep 8, 2025

Merge branch 'main' into ar-config

5579b6a

Signed-off-by: Julien Lin <jullin@nvidia.com>

mergify bot removed the needs-rebase label Sep 8, 2025

hmellor reviewed Sep 8, 2025

View reviewed changes

vllm/config/compilation.py Show resolved Hide resolved

Update vllm/config/compilation.py

5bb70de

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: nvjullin <jullin@nvidia.com>

ProExpertProg moved this from To triage to In progress in torch.compile integration Sep 19, 2025

ProExpertProg moved this from In progress to In review in torch.compile integration Sep 29, 2025

Uh oh!

[Misc] Moved override for allreduce fusion thresholds from env var to config #23722

Are you sure you want to change the base?

[Misc] Moved override for allreduce fusion thresholds from env var to config #23722

Conversation

nvjullin commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ilmarkov commented Aug 27, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvjullin commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilmarkov commented Aug 28, 2025

Uh oh!

hmellor commented Sep 1, 2025

Uh oh!

nvjullin commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Sep 2, 2025

Uh oh!

ilmarkov commented Sep 2, 2025

Uh oh!

hmellor commented Sep 2, 2025

Uh oh!

ilmarkov commented Sep 2, 2025

Uh oh!

ProExpertProg commented Sep 2, 2025

Uh oh!

hmellor commented Sep 2, 2025

Uh oh!

nvjullin commented Sep 5, 2025

Uh oh!

ProExpertProg commented Sep 5, 2025

Uh oh!

mergify bot commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nvjullin commented Aug 27, 2025 •

edited by github-actions bot

Loading

hmellor commented Aug 27, 2025 •

edited

Loading

nvjullin commented Aug 28, 2025 •

edited

Loading

nvjullin commented Sep 2, 2025 •

edited

Loading