Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zero3] release tmp memory when consolidating fp16 weights #1220

Merged
merged 5 commits into from
Jul 12, 2021

Conversation

stas00
Copy link
Collaborator

@stas00 stas00 commented Jul 12, 2021

Currently GatheredParameters with modifier_rank=None doesn't release memory until the param is used next time some time in the future. As a result _zero3_consolidated_fp16_state_dict was leaking memory. This PR uses the read/write mode which re-partitions the memory right away and releases the temporarily gathered params.

@tjruwase

@jeffra jeffra merged commit 2660cc4 into microsoft:master Jul 12, 2021
@stas00 stas00 deleted the less-mem-fp16-save branch July 12, 2021 21:03
stas00 added a commit to stas00/DeepSpeed that referenced this pull request Jul 12, 2021
microsoft#1220 fixed the leak, but lead to another problem. reverting that part so that we could do release and will work on it after the release.

@jeffra
@stas00 stas00 mentioned this pull request Jul 12, 2021
jeffra pushed a commit that referenced this pull request Jul 12, 2021
#1220 fixed the leak, but lead to another problem. reverting that part so that we could do release and will work on it after the release.

@jeffra
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants