Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple zero stage 3 related fixes #3886

Merged
merged 28 commits into from
Jul 28, 2023
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
484922a
Option to override module apply
tjruwase Jun 30, 2023
262b57d
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 5, 2023
0920c0c
Removing early partitioning in override
tjruwase Jul 5, 2023
854e76e
Merge branch 'olruwase/override_module_apply' of github.com:microsoft…
tjruwase Jul 5, 2023
2a04012
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 10, 2023
5f21888
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 13, 2023
c4fdf77
Unit tests
tjruwase Jul 13, 2023
cfe0f57
Merge branch 'olruwase/override_module_apply' of github.com:microsoft…
tjruwase Jul 13, 2023
8de71db
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 18, 2023
7c088b3
Cleanup
tjruwase Jul 19, 2023
297485f
Merge branch 'olruwase/override_module_apply' of github.com:microsoft…
tjruwase Jul 19, 2023
4fd1f3c
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 19, 2023
972c958
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 24, 2023
0f5a508
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 25, 2023
2343cfe
Adapt unit test to succeed
tjruwase Jul 25, 2023
7a0c339
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 25, 2023
fcb3bad
Handle missed params
tjruwase Jul 25, 2023
dd6f334
Merge branch 'olruwase/override_module_apply' of github.com:microsoft…
tjruwase Jul 25, 2023
afa4a03
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 25, 2023
e66a5c3
Add accelerate
tjruwase Jul 25, 2023
c6cc44d
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 25, 2023
3d93b25
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 25, 2023
8cb2418
Code cleanup
tjruwase Jul 28, 2023
dc8e81c
Add doc
tjruwase Jul 28, 2023
92d557f
Merge branch 'master' into olruwase/override_module_apply
tjruwase Jul 28, 2023
831dfb2
Add doc
tjruwase Jul 28, 2023
1fd80f0
Add doc
tjruwase Jul 28, 2023
4f19b59
Merge branch 'olruwase/override_module_apply' of github.com:microsoft…
tjruwase Jul 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add doc
  • Loading branch information
tjruwase committed Jul 28, 2023
commit 831dfb29432d0f658a11efb9e748f790a78c0be6
23 changes: 12 additions & 11 deletions docs/code-docs/source/zero3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ for a complete list of options for configuration and performance tuning.
ZeRO-Infinity and ZeRO-Offload work best with our heavily optimized
:class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using
our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_
to instruct :meth:`deepspeed.initialize` to build the optimizer for you.
to instruct :meth:`deepspeed.initialize` to build the optimizer for you. `Module.apply <https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_

ZeRO Configurations
===================
Expand Down Expand Up @@ -309,6 +309,17 @@ DeepSpeed can automatically detect the following external parameter scenarios:
.. autofunction:: deepspeed.zero.unregister_external_parameter


.. `Module.apply <https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=module+apply#torch.nn.Module.apply>`_
Overriding Module.apply
===============================
`Module.apply <https://pytorch.org/docs/stable/generated/torch.nn.Module.html>`_is a convenient mechanism for customizing model initialization.
With ZeRO stage 3, ``Module.apply`` implementations must account for parameter partitioning by ``zero.Init`` during model initialization. The default behavior of ZeRO stage 3 is to automatically
handle this issue by overriding ``Module.apply`` to ensure that parameters are gathered before access by ``Module.apply``. The benefit of this approach is development convenience, since
users are saved the burden of manual parameter coordination in ``Module.apply``. However, the downside is slow model initialization, since all the model parameters (e.g., billions) are gathered
even though the common usage of ``Module.apply`` is to customize a few parameters. Developers can disable this default behavior by setting the ``override_module_apply`` configuration knob to ``False``,
for faster model initialization at the cost of manually handling partitioned parameters in their ``Module.apply`` implementations.


Memory-Centric Tiling
---------------------

Expand Down Expand Up @@ -389,13 +400,3 @@ The following code snippet illustrates this functionality.

# Free GPU memory consumed by model parameters
ds_engine.empty_partition_cache()


Overriding Module.apply
---------------------
`Module.apply <https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=module+apply#torch.nn.Module.apply>`_is a convenient mechanism for customizing model initialization.
With ZeRO stage 3, ``Module.apply`` implementations must account for parameter partitioning by ``zero.Init`` during model initialization. The default behavior of ZeRO stage 3 is to automatically
handle this issue by overriding ``Module.apply`` to ensure that parameters are gathered before access by ``Module.apply``. The benefit of this approach is development convenience, since
users are saved the burden of manual parameter coordination in ``Module.apply``. However, the downside is slow model initialization, since all the model parameters (i.e., billions) are gathered
even though the common usage of ``Module.apply`` is to customize only a few parameters. Developers can disable this default behavior by setting the ``override_module_apply`` configuration knob to `False`,
for faster model initialization at the cost of manually handling partitioned parameters in their ``Module.apply`` implementations.