Skip to content

Conversation

@nvjullin
Copy link
Contributor

@nvjullin nvjullin commented Aug 27, 2025

Purpose

Follow up on #23639.
Also cleaned up two competing/conflicting ways of tuning thresholds: number of tokens vs size.
Size is the relevant parameter (for perf), so we should only use that.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Lin <jullin@nvidia.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the configuration for allreduce fusion thresholds, moving them from an environment variable to a more structured configuration object. The cleanup of the logic for tuning thresholds is also a welcome improvement. I've found one potential performance issue in the calculation of max_token_num which appears to be overly conservative and could prevent the fused kernel from being used in some cases where it would be beneficial. Please see the detailed comment.

@ilmarkov
Copy link
Contributor

I am changing the constants and a bit of logic in the other PR. But keeping the max size and cleaning the other tuning ways make sense to me.
LGTM.

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we could restructure this such that the defaults are also reflected in config that would be nice. So maybe config asks the pass for. defaults but uses CLI values with precedence.

@hmellor
Copy link
Member

hmellor commented Aug 27, 2025

I agree it would be nice if the the defaults could be the default of the actual config field rather than living with the implementation

@nvjullin
Copy link
Contributor Author

nvjullin commented Aug 28, 2025

I agree it would be nice if the the defaults could be the default of the actual config field

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, "4": 1, "6": 1, "8": 8}. This is quite bad UI.

I think if we could restructure this such that the defaults are also reflected in config that would be nice.

Right now, the comment explains the defaults, so it is indeed reflected in the config. The issue is that the comment has to be in sync with the implementation. It's not ideal, but otherwise we'll have to write a new dict-like class to handle the aforementioned UI problem which I think is overkill for a very niche config option.

Another option is to have a default of {"2": 64, "4": 1, "6": 1, "8": 1} in config and fall back to the one in flashinfer_max_size when the config is empty. This is essentially the same as the current situation where we have a comment explaining the default: we still have to keep them in sync.

@ilmarkov
Copy link
Contributor

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, >"4": 1, "6": 1, "8": 8}

@nvjullin I'd suggest to update the default config with user-provided dictionary. I believe user usually needs to specify one key:value pair at the initialization to update the default config.

@hmellor
Copy link
Member

hmellor commented Sep 1, 2025

If the default is {"2": 64, "4": 1, "6": 1, "8": 1}, then if the user wants to override 8 only, the user will have to pass {"2": 64, "4": 1, "6": 1, "8": 8}. This is quite bad UI.

I don't agree. If I, as a user, pass a dict as an argument I would expect my dict to be used, not for it to be merged into another dict that I don't know about.

@nvjullin
Copy link
Contributor Author

nvjullin commented Sep 2, 2025

If I, as a user, pass a dict as an argument I would expect my dict to be used, not for it to be merged into another dict that I don't know about.

  1. you do know about it, it's documented
  2. they're defaults, as in, if you don't specify something, we fall back on the documented values

The alternative is to not have defaults and error out. Suppose the user passes

--tensor-parallel 2 --fi_allreduce_fusion_max_size_mb {"4": 4}

Do you want the behavior to be

  1. Error out: we can use a default dict argument and only that
  2. Fallback on 0.5: we can use a default dict argument and still have to document the 0.5 used in max_sizes.get(world_size, MiB // 2) that isn't visible in the arguments
  3. Fallback on 64: you can't just have a default dict argument. This would've been the default had I not passed --fi_allreduce_fusion_max_size_mb {"4": 4}, which is what I'd expect because I didn't say anything about world size 2.

@hmellor
Copy link
Member

hmellor commented Sep 2, 2025

I think option 2? So that the default would be:

fi_allreduce_fusion_max_size_mb: dict[int, float] = field(default_factory=lambda _: {2: 64 * MiB, 4: MiB})

And we can document that any unspecified world sizes will default to 0.5MiB?

@ProExpertProg what do you think?

@ilmarkov
Copy link
Contributor

ilmarkov commented Sep 2, 2025

@hmellor

  1. The defaults must be different depending on device.
  2. I believe we can just ignore (don't enable fusion) all world size cases except 2,4,8,16 as flashinfer only supports these.

@hmellor
Copy link
Member

hmellor commented Sep 2, 2025

Thanks both for the additional context.

My question now is, do we need to expose this to the user at all? If it:

  • depends on world size and device type
  • is only updating rather than replacing the default

What is the use case for setting different values?

@ilmarkov
Copy link
Contributor

ilmarkov commented Sep 2, 2025

What is the use case for setting different values?

I think this config was originally intended to be used for simplicity of testing and benchmarking

@ProExpertProg
Copy link
Collaborator

@hmellor I think merging dict is not that unintuitive because we already do it in other places. And like Ilia mentioned this is a developer-user switch and not as much of a user-user switch.

I think the default should be an empty dict and it respects the user-passed values first with a fallback to the defaults, which are dependent on the device. We can document this behavior well in the field docstring. The defaults are set into config in __post_init__ (either global or CompilationConfig/PassConfig for easier visibility and we can make it so that these sizes are always logged if we want the visibility of what the defaults are.

A user will also only run with a single value of TP for a single config struct so the behavior for unspecified world sizes does not matter as much.

@hmellor
Copy link
Member

hmellor commented Sep 2, 2025

Ok, that works for me! I just wanted to explore the possibility of bringing the default higher up. Let's leave it as it is.

@nvjullin
Copy link
Contributor Author

nvjullin commented Sep 5, 2025

I may be misunderstanding something, but is the current state of the PR fine? Or do we want to change it to something else?

@ProExpertProg
Copy link
Collaborator

@nvjullin instead of merging a config dict with the default dict when the value is used, I would still merge that during config init so that the defaults appear in the config when it's printed.

@mergify
Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @nvjullin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 8, 2025
Signed-off-by: Julien Lin <jullin@nvidia.com>
@mergify mergify bot removed the needs-rebase label Sep 8, 2025
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: nvjullin <jullin@nvidia.com>
@ProExpertProg ProExpertProg moved this from To triage to In progress in torch.compile integration Sep 19, 2025
@ProExpertProg ProExpertProg moved this from In progress to In review in torch.compile integration Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

4 participants