Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flops_profiler: add option recompute_fwd_factor for the case of activation c… #3362

Merged
merged 3 commits into from
Jun 2, 2023

Conversation

guoyejun
Copy link
Contributor

@guoyejun guoyejun commented Apr 24, 2023

…heckpoint

When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated.

In this PR, we just suppose all fwd is re-computed as an estimation.

I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler.

@guoyejun guoyejun force-pushed the flops_profiler branch 2 times, most recently from be40d39 to 7eb9d92 Compare April 25, 2023 04:41
@guoyejun
Copy link
Contributor Author

with this PR, if the recompute_fwd is true, the output changes from

bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: ... TFLOPS
fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency): ... TFLOPS
FLOPS per GPU = 3 * fwd flops per GPU / iter latency: ... TFLOPS

to

bwd FLOPS per GPU = 3 * fwd flops per GPU / bwd latency: ... TFLOPS
fwd+bwd FLOPS per GPU = 4 * fwd flops per GPU / (fwd+bwd latency): ... TFLOPS
FLOPS per GPU = 4 * fwd flops per GPU / iter latency: ... TFLOPS

It is more accurate with this PR.

@guoyejun
Copy link
Contributor Author

just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm.

for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0.

comment are welcome, thanks.

@guoyejun
Copy link
Contributor Author

any comment? thanks.

@guoyejun
Copy link
Contributor Author

guoyejun commented May 6, 2023

@jeffra @tjruwase @cli99 any comment? thanks.

@guoyejun
Copy link
Contributor Author

just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm.

for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0.

comment are welcome, thanks.

plan to change to this idea if no other comment, thanks.

…ation recompute

When activation checkpointing is enabled, most of forward is re-computed,
and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0

I don't find a way to pass the option from model script to deepspeed engine,
and so add option directly for flops_profiler.
@yejunguo
Copy link

just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm.
for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0.
comment are welcome, thanks.

plan to change to this idea if no other comment, thanks.

recompute_fwd_factor is used in the latest commit, could you please take a review, thanks. @jeffra @tjruwase @cli99

@guoyejun guoyejun changed the title flops_profiler: add option recompute_fwd for the case of activation c… flops_profiler: add option recompute_fwd_factor for the case of activation c… May 19, 2023
@cli99
Copy link
Contributor

cli99 commented May 22, 2023

@guoyejun thanks, recompute_fwd_factor is good to have.

@tjruwase tjruwase merged commit 460bec4 into microsoft:master Jun 2, 2023
molly-smith pushed a commit that referenced this pull request Jun 23, 2023
…ation recompute (#3362)

When activation checkpointing is enabled, most of forward is re-computed,
and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0

I don't find a way to pass the option from model script to deepspeed engine,
and so add option directly for flops_profiler.

Co-authored-by: Cheng Li <pistasable@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants