You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
partition_activations (bool) β Enables partition activation when used with ZeRO stage 3 and model parallelism. Still requires you to wrap your forward functions in deepspeed.checkpointing.checkpoint. See deepspeed tutorial.
Upon encountering issues with activation partitioning and after checking, I found that Deepspeed activation partitioning is not significantly related to the use of zero-3; rather, it appears that the setup of model parallelism and mpu object is crucial.
Also, it is explicitly stated that pipeline parallelism, a model parallelism method provided by Deepspeed, cannot be used in conjunction with zero-2 and zero-3 from the outset.
Therefore, it is thought that there should be clearer statements regarding the use conditions for activation partitioning, beyond simply specifying that it should be used with zero3 + mp.
π Documentation
Hello,
Upon encountering issues with activation partitioning and after checking, I found that Deepspeed activation partitioning is not significantly related to the use of zero-3; rather, it appears that the setup of model parallelism and mpu object is crucial.
Also, it is explicitly stated that pipeline parallelism, a model parallelism method provided by Deepspeed, cannot be used in conjunction with zero-2 and zero-3 from the outset.
Additionally, in the GitHub issue referenced in the official documentation, zero-stage3 and activation partitioning are used together; however, this pairing holds no particular significance.
Therefore, it is thought that there should be clearer statements regarding the use conditions for activation partitioning, beyond simply specifying that it should be used with zero3 + mp.
cc @Borda @awaelchli
The text was updated successfully, but these errors were encountered: