-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Sparse tensors not updating #1914
Conversation
@Dipet, can you please share a bit more about this issue. In particular, what do you mean by |
Yes, I am sorry, I forget that in this PR we can not see the second call of When we call twice import torch
tensor = torch.tensor([0.0, 1., 2.], requires_grad=True)
tensor.sum().backward()
g = tensor.grad
print("Original gradient")
print(g) # tensor([1., 1., 1.])
d = g.data
d.data = torch.tensor([3.0, 4., 5.])
print("Data is not assigned to original tensor gradient")
print(tensor.grad) # tensor([1., 1., 1.])
g.data = torch.tensor([3.0, 4., 5.])
print("Data successfully assigned")
print(tensor.grad) # tensor([3., 4., 5.]) |
For dense tensors we use inplace |
ok, I checked inplace
|
@Dipet, thanks for sharing this context. I have not worked a lot with sparse tensors, so that is probably why I have not run into this issue. My suggestion would be to implement this logic only for the sparse tensor code path, so that the dense tensor code paths remain unchanged for backwards-compatibility. I am happy to help brain-storm on this. FYI @jeffra, who has worked more on sparse tensors. |
@jeffra Tests failed, but it looks like problem with hardware, because tests failed on |
@jeffra Can you look at the PR? |
Guys, could you take a look at this PR? |
@Dipet, apologies for the delay. I had suggested restricting this change only to sparse tensors so that dense tensor code paths remain backwards compatible. Is there a problem with doing that? |
@tjruwase I changed the logic for sparse tensors only, dense tensors should work the same as before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
param.grad.data
returns new object, for this reason we do not assign updated sparse gradients to the target tensor and the optimizer works with unique gradient tensor on each device.