Skip to content

[maskedtensor] Distinguish between 0 and NaN gradient #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions beginner_source/maskedtensor_distinguish_gradient.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
Distinguishing between 0 and NaN gradient
-----------------------------------------

One issue that :class:`torch.Tensor` runs into is the inability to distinguish between gradients that are not
defined (NaN) vs. gradients that are actually 0. By way of example, below are several different issues where
:class:`MaskedTensor` can resolve and/or work around the NaN gradient problem.

`Issue 10729 <https://github.com/pytorch/pytorch/issues/10729>`__ -- torch.where
--------------------------------------------------------------------------------

Current result:

>>> x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100], requires_grad=True, dtype=torch.float)
>>> y = torch.where(x < 0, torch.exp(x), torch.ones_like(x))
>>> y.sum().backward()
>>> x.grad
tensor([4.5400e-05, 6.7379e-03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, nan, nan])

:class:`MaskedTensor` result:

>>> x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100])
>>> mask = x < 0
>>> mx = masked_tensor(x, mask, requires_grad=True)
>>> my = masked_tensor(torch.ones_like(x), ~mask, requires_grad=True)
>>> y = torch.where(mask, torch.exp(mx), my)
>>> y.sum().backward()
>>> mx.grad
MaskedTensor(
[ 0.0000, 0.0067, --, --, --, --, --, --, --, --, --]
)

The gradient here is only provided to the selected subset. Effectively, this changes the gradient of `where`
to mask out elements instead of setting them to zero.

`Issue 52248 <https://github.com/pytorch/pytorch/issues/52248>`__ -- another torch.where
----------------------------------------------------------------------------------------

Current result:

>>> a = torch.randn((), requires_grad=True)
>>> b = torch.tensor(False)
>>> c = torch.ones(())
>>> torch.where(b, a/0, c)
tensor(1., grad_fn=<WhereBackward0>)
>>> torch.autograd.grad(torch.where(b, a/0, c), a)
(tensor(nan),)

:class:`MaskedTensor` result:

>>> a = masked_tensor(torch.randn(()), torch.tensor(True), requires_grad=True)
>>> b = torch.tensor(False)
>>> c = torch.ones(())
>>> torch.where(b, a/0, c)
MaskedTensor( 1.0000, True)
>>> torch.autograd.grad(torch.where(b, a/0, c), a)
(MaskedTensor(--, False),)

`Issue 67180 <https://github.com/pytorch/pytorch/issues/67180>`__ -- :func:`torch.nansum` and :func:`torch.nanmean`
-------------------------------------------------------------------------------------------------------------------

Current result:

>>> a = torch.tensor([1., 2., float('nan')])
>>> b = torch.tensor(1.0, requires_grad=True)
>>> c = a * b
>>> c1 = torch.nansum(c)
>>> bgrad1, = torch.autograd.grad(c1, b, retain_graph=True)
>>> bgrad1
tensor(nan)

:class:`MaskedTensor` result:

>>> a = torch.tensor([1., 2., float('nan')])
>>> b = torch.tensor(1.0, requires_grad=True)
>>> mt = masked_tensor(a, ~torch.isnan(a))
>>> c = mt * b
>>> c1 = torch.sum(c)
>>> bgrad1, = torch.autograd.grad(c1, b, retain_graph=True)
>>> bgrad1
MaskedTensor( 3.0000, True)

`Issue 4132 <https://github.com/pytorch/pytorch/issues/4132>`__ -- when using mask, x/0 yields NaN grad
-------------------------------------------------------------------------------------------------------

Current result:

>>> x = torch.tensor([1., 1.], requires_grad=True)
>>> div = torch.tensor([0., 1.])
>>> y = x/div # => y is [inf, 1]
>>> mask = (div != 0) # => mask is [0, 1]
>>> y[mask].backward()
>>> x.grad # grad is [nan, 1], but expected [0, 1]
tensor([nan, 1.])

:class:`MaskedTensor` result:

>>> x = torch.tensor([1., 1.], requires_grad=True)
>>> div = torch.tensor([0., 1.])
>>> y = x/div # => y is [inf, 1]
>>>
>>> mask = (div != 0) # => mask is [0, 1]
>>> loss = as_masked_tensor(y, mask)
>>> loss.sum().backward()
>>> x.grad
MaskedTensor(
[ --, 1.0000]
)
2 changes: 2 additions & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -811,6 +811,8 @@ Additional Resources
:caption: MaskedTensor

beginner/maskedtensor_overview
beginner/maskedtensor_sparsity
beginner/maskedtensor_distinguish_gradient


.. toctree::
Expand Down