Allow modification of zero partitioned parameters #4192

tjruwase · 2023-08-22T13:52:17Z

Utilities for flexible modification of partitioned fp32 parameters and optimizer states.

tohtana

This is a great feature! The code modifications and the document are also clear. I do have one observation, though it's not immediately pressing:

Currently we have three get_* functions (safe_get_full_fp32_param, safe_get_full_grad, and safe_get_full_optimizer_state). This PR introduces safe_set_full_fp32_param and safe_set_full_optimizer_state. Is there a specific reason we're omitting safe_set_full_grad?
Maintaining consistency in the APIs can help users understand the design better.

deepspeed/runtime/zero/stage3.py

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

tjruwase · 2023-08-30T17:32:03Z

This is a great feature! The code modifications and the document are also clear. I do have one observation, though it's not immediately pressing:

Currently we have three get_* functions (safe_get_full_fp32_param, safe_get_full_grad, and safe_get_full_optimizer_state). This PR introduces safe_set_full_fp32_param and safe_set_full_optimizer_state. Is there a specific reason we're omitting safe_set_full_grad? Maintaining consistency in the APIs can help users understand the design better.

@tohtana, thanks for this valid question. I am delaying support for safe_set_full_grad until there is explicit request for it because it is harder to implement and I have limited bandwidth to think through all the design issues :(. I will add a TODO for this. Thanks for the review.

mayank31398 · 2023-09-03T03:12:06Z

@tjruwase
can we get the method safe_set_gradients?
This is required for using Megatron's sequence parallel instead of Ulysees.
Wanted to check if this is possible anyhow and would be super-useful.

mayank31398 · 2023-09-03T03:55:35Z

Essentialy whats needed:

grad = get_grads(layernorm.weight)
dist.all_reduce(grad, group=tp_group)
safe_set_grads(grad, layernorm.weight)

if there is an alternative way to do it, that will also be helpful.

tjruwase added 2 commits August 22, 2023 17:46

Modify zero parameters

3aa58e3

Docs

72d674c

tjruwase requested review from jeffra and jomayeri August 22, 2023 13:52

tjruwase requested review from samyam, mrwyattii and awan-10 as code owners August 22, 2023 13:52

tjruwase added 3 commits August 22, 2023 09:52

Merge branch 'master' into olruwase/ds_3830

985065a

py3.6 compatibility

b067751

Update docs

49e8153

tjruwase mentioned this pull request Aug 22, 2023

How to modify weights during training in a deepspeed stage 3 model #3830

Closed

Merge branch 'master' into olruwase/ds_3830

217911a

tohtana approved these changes Aug 30, 2023

View reviewed changes

mrwyattii approved these changes Aug 30, 2023

View reviewed changes

deepspeed/runtime/zero/stage3.py Outdated Show resolved Hide resolved

Update deepspeed/runtime/zero/stage3.py

74ddf3c

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

tjruwase added 3 commits August 31, 2023 04:23

Add TODO

924a86c

Formatting

dfec223

Merge branch 'master' into olruwase/ds_3830

cee50aa

tjruwase added this pull request to the merge queue Aug 31, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 31, 2023

mrwyattii added this pull request to the merge queue Sep 1, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 1, 2023

mrwyattii added this pull request to the merge queue Sep 1, 2023

Merged via the queue into master with commit a23cda6 Sep 2, 2023

tjruwase mentioned this pull request Nov 15, 2023

Add get and set APIs for the ZeRO-3 partitioned parameters #4681

Merged

3 tasks

loadams deleted the olruwase/ds_3830 branch February 28, 2024 18:14

CongHan0808 mentioned this pull request Mar 6, 2024

Fintune part of a whole embeding parameters. #5231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow modification of zero partitioned parameters #4192

Allow modification of zero partitioned parameters #4192

tjruwase commented Aug 22, 2023

tohtana left a comment

tjruwase commented Aug 30, 2023

mayank31398 commented Sep 3, 2023 •

edited

Loading

mayank31398 commented Sep 3, 2023

Allow modification of zero partitioned parameters #4192

Allow modification of zero partitioned parameters #4192

Conversation

tjruwase commented Aug 22, 2023

tohtana left a comment

Choose a reason for hiding this comment

tjruwase commented Aug 30, 2023

mayank31398 commented Sep 3, 2023 • edited Loading

mayank31398 commented Sep 3, 2023

mayank31398 commented Sep 3, 2023 •

edited

Loading