Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix eager memory leak and re-enable new checkpoint #4008

Merged
merged 2 commits into from
Dec 14, 2020

Conversation

daquexian
Copy link
Contributor

IsStreamInParallelDesc 里没有特殊处理 ControlStreamType,一部分 DeleteObject 指令被错误忽略了,导致 object 的引用计数不正确,引起内存泄漏

修复之后,在原先重复调用 save 会 OOM 的代码里测试,不再出现内存泄漏的情况。现在每张卡上由 checkpoint 引起的显存占用只有 32M,是流式 save/load/init 时的一个 slice 的大小

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
@lixinqi lixinqi merged commit 76e1663 into master Dec 14, 2020
@lixinqi lixinqi deleted the fix_eager_memory_leak branch December 14, 2020 11:14
liujuncheng pushed a commit that referenced this pull request Jun 3, 2021
* fix incorrectly ignored DeleteObject instruction

Signed-off-by: daquexian <daquexian566@gmail.com>

* re-enable new checkpoint

Signed-off-by: daquexian <daquexian566@gmail.com>
Former-commit-id: 76e1663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants