-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to chnage checkpoint in on_save_checkpoint with Deepspeed #18747
Comments
I have the same issue! |
Unfortunately, deepspeed 0.10 is not supported yet. Our testing is currently pinned to https://github.com/Lightning-AI/lightning/blob/master/requirements/pytorch/strategies.txt#L6 |
I setup the environment for the compatible version:
And modify the code accordingly:
The output of script:
The change of parameter is not saved by deepspeed. |
Is there any solution? Saving a freeze LLM in checkpoint is too large and slow. |
any improvement? |
Bug description
When using DeepSpeed, the changes of checkpoint (add/remove key) in
on_save_checkpoint
are not being preserved. Switching strategy toddp
, the changes are saved as expected.Environment
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
No response
cc @awaelchli
The text was updated successfully, but these errors were encountered: