-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensors are leaked when model.save()
includes the optimizer
#8238
Comments
Hi, @Vectorrent Thank you for bringing this issue to our attention and I was trying to replicate the same behaviour from my end on my macOS and I'm getting below output with
Please let me know if I have missed anything here. Thank you for your cooperation and patience. |
Thanks for the quick response. Sadly, Until then, my solution is to 1) create a manual training loop, 2) save the model, 3) unload the model, 4) re-load the model, 5) resume training. Not a great solution, if you ask me 🤣 |
I cannot for the life of me figure out how to build TFJS locally on my computer, so I'm not really able to debug or test this properly. Regardless, I've been digging, and this is probably where we need to apply a fix: If I had to guess, maybe its related to the use of |
I wrapped the saving of the model in tf.engine().startScope() and tf.engine().endScope() to prevent the leaking tensor. |
System information
Describe the current behavior
When using
tensorflow-node-gpu
for training, I periodically save models to disk. However, my training has been crashing, and I've just learned why:When
model.save()
includes the optimizer, a single tensor is leaked. This leads to the slow accumulation of unnecessary tensors, and crashes my computer after some amount of time:To be clear, this is before saving a model:
And this is after:
Describe the expected behavior
I would expect model-saving to dispose of all unused tensors, after the operation is complete.
Standalone code to reproduce the issue
This bug is 100% reproducible in both
tfjs-node
andtfjs-node-gpu
:Other info / logs
includeOptimizer
flag is disabled, then this does not occur.The text was updated successfully, but these errors were encountered: