-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Misc] Moved override for allreduce fusion thresholds from env var to config #23722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Julien Lin <jullin@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully refactors the configuration for allreduce fusion thresholds, moving them from an environment variable to a more structured configuration object. The cleanup of the logic for tuning thresholds is also a welcome improvement. I've found one potential performance issue in the calculation of max_token_num which appears to be overly conservative and could prevent the fused kernel from being used in some cases where it would be beneficial. Please see the detailed comment.
Signed-off-by: Julien Lin <jullin@nvidia.com>
|
I am changing the constants and a bit of logic in the other PR. But keeping the max size and cleaning the other tuning ways make sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we could restructure this such that the defaults are also reflected in config that would be nice. So maybe config asks the pass for. defaults but uses CLI values with precedence.
|
I agree it would be nice if the the defaults could be the default of the actual config field rather than living with the implementation |
If the default is
Right now, the comment explains the defaults, so it is indeed reflected in the config. The issue is that the comment has to be in sync with the implementation. It's not ideal, but otherwise we'll have to write a new Another option is to have a default of |
@nvjullin I'd suggest to update the default config with user-provided dictionary. I believe user usually needs to specify one key:value pair at the initialization to update the default config. |
I don't agree. If I, as a user, pass a |
The alternative is to not have defaults and error out. Suppose the user passes Do you want the behavior to be
|
|
I think option 2? So that the default would be: fi_allreduce_fusion_max_size_mb: dict[int, float] = field(default_factory=lambda _: {2: 64 * MiB, 4: MiB})And we can document that any unspecified world sizes will default to 0.5MiB? @ProExpertProg what do you think? |
|
|
Thanks both for the additional context. My question now is, do we need to expose this to the user at all? If it:
What is the use case for setting different values? |
I think this config was originally intended to be used for simplicity of testing and benchmarking |
|
@hmellor I think merging dict is not that unintuitive because we already do it in other places. And like Ilia mentioned this is a developer-user switch and not as much of a user-user switch. I think the default should be an empty dict and it respects the user-passed values first with a fallback to the defaults, which are dependent on the device. We can document this behavior well in the field docstring. The defaults are set into config in A user will also only run with a single value of TP for a single config struct so the behavior for unspecified world sizes does not matter as much. |
|
Ok, that works for me! I just wanted to explore the possibility of bringing the default higher up. Let's leave it as it is. |
|
I may be misunderstanding something, but is the current state of the PR fine? Or do we want to change it to something else? |
|
@nvjullin instead of merging a config dict with the default dict when the value is used, I would still merge that during config init so that the defaults appear in the config when it's printed. |
Signed-off-by: Julien Lin <jullin@nvidia.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: nvjullin <jullin@nvidia.com>
Purpose
Follow up on #23639.
Also cleaned up two competing/conflicting ways of tuning thresholds: number of tokens vs size.
Size is the relevant parameter (for perf), so we should only use that.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.