You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 7, 2024. It is now read-only.
Summary:
~~Add an option to combine the amax sync reduction~~ (Use combine-reduction as the default behavior)
- Combine the reduction call of each type amax scaling factor (totally 3 all_reduce calls). We can also further combine them into one single call.
- Verified other tests can still pass. So we don't need to change existing benchmark code.
- pytest test/test_base.py
- ./test/test_fsdp.sh
- Tested the new option using small llama models with 8 fsdp groups. Time taken by sync_float8_amax_and_scale_history reduced from 29ms[1] to 3ms[2].
[1] Traces without combine reduction, https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/trace.138932292910521.json.gz&bucket=acadia
[2] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/trace.202842416426594.json.gz&bucket=acadia
\* Trace[2] was updated after addressing the comments.
\*\* Need Meta internal access to open these traces.
Pull Request resolved: #163
Reviewed By: drisspg
Differential Revision: D52271595
Pulled By: y-sq
fbshipit-source-id: 65d27d32cb4d291dc6fbe62b7a916cf2e32e6482
0 commit comments