You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. DeepSpeedZeroOptimizer uses sp_process_group to partition gradient parameters. Is it possible to use tp_parallel group instead? Otherwise you store all parameter gradients on each tp gpu