-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flops_profiler: add option recompute_fwd_factor for the case of activation c… #3362
Conversation
be40d39
to
7eb9d92
Compare
with this PR, if the recompute_fwd is true, the output changes from bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: ... TFLOPS to bwd FLOPS per GPU = 3 * fwd flops per GPU / bwd latency: ... TFLOPS It is more accurate with this PR. |
just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm. for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0. comment are welcome, thanks. |
any comment? thanks. |
plan to change to this idea if no other comment, thanks. |
…ation recompute When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0 I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler.
recompute_fwd_factor is used in the latest commit, could you please take a review, thanks. @jeffra @tjruwase @cli99 |
@guoyejun thanks, recompute_fwd_factor is good to have. |
…ation recompute (#3362) When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0 I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler. Co-authored-by: Cheng Li <pistasable@gmail.com>
…heckpoint
When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated.
In this PR, we just suppose all fwd is re-computed as an estimation.
I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler.