Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[numpy] Add torch.nan_to_num #44592

Closed

Conversation

kshitij12345
Copy link
Collaborator

@kshitij12345 kshitij12345 commented Sep 12, 2020

Reference #42515

TODO:

  • Add tests
  • Add docs

@kshitij12345 kshitij12345 marked this pull request as draft September 12, 2020 11:53
@kshitij12345 kshitij12345 changed the title [numpy] Add torch.nan_to_num [WIP] [numpy] Add torch.nan_to_num Sep 12, 2020
@dr-ci
Copy link

dr-ci bot commented Sep 12, 2020

💊 CI failures summary and remediations

As of commit 6141370 (more details on the Dr. CI page):



🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 82 times.

@codecov
Copy link

codecov bot commented Sep 13, 2020

Codecov Report

Merging #44592 into master will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #44592      +/-   ##
==========================================
- Coverage   68.15%   68.07%   -0.09%     
==========================================
  Files         396      393       -3     
  Lines       51133    50990     -143     
==========================================
- Hits        34851    34709     -142     
+ Misses      16282    16281       -1     
Impacted Files Coverage Δ
torch/overrides.py 97.08% <ø> (ø)
...ch/testing/_internal/common_methods_invocations.py 91.41% <ø> (ø)
torch/_tensor_docs.py 100.00% <100.00%> (ø)
torch/_torch_docs.py 100.00% <100.00%> (ø)
torch/distributed/rpc/options.py 33.33% <0.00%> (-50.01%) ⬇️
torch/distributed/rpc/backend_registry.py 32.35% <0.00%> (-16.04%) ⬇️
torch/distributed/optim/optimizer.py 29.78% <0.00%> (-7.57%) ⬇️
torch/nn/quantized/modules/conv.py 85.25% <0.00%> (-4.32%) ⬇️
torch/optim/adagrad.py 79.03% <0.00%> (-4.31%) ⬇️
torch/testing/_internal/dist_utils.py 33.33% <0.00%> (-2.11%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb19a55...6141370. Read the comment docs.

@hameerabbasi hameerabbasi mentioned this pull request Sep 15, 2020
8 tasks
@kshitij12345 kshitij12345 marked this pull request as ready for review September 21, 2020 13:44
@kshitij12345 kshitij12345 changed the title [WIP] [numpy] Add torch.nan_to_num [numpy] Add torch.nan_to_num Sep 21, 2020
@kshitij12345
Copy link
Collaborator Author

@mruberry Please take a look :)

Have made a independent test as I dont think that UnaryUfuncInfo can take care of nan, posinfo, neginfo arguments.

gradcheck works but for gradgradcheck, we get

RuntimeError: isposinf(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.

Should we just manually add zeros_like for the nan_to_num_backward?

@albanD
Copy link
Collaborator

albanD commented Sep 21, 2020

Hi,

For the formula issue, it is an oversight on our side sorry.
Could you add to this list a new line stating that functions returning boolean type are not differentiable and add the strings "isnan", "isposinf", "isneginf" and "isinf" there. That will solve your issue.

Also for the backward function, I think you can use isinf instead of doing two calls for the pos and neg infs.

* update derivative not required ops.
* add gradgradcheck to test.
* use isinf.
@kshitij12345
Copy link
Collaborator Author

@albanD

Could you add to this list a new line stating that functions returning boolean type are not differentiable and add the strings "isnan", "isposinf", "isneginf" and "isinf" there. That will solve your issue.

Thanks that worked!

Also for the backward function, I think you can use isinf instead of doing two calls for the pos and neg infs.

Ah. Right! Thanks!

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autograd part looks good to me. Just small comment on the test.

I'll let @mruberry review the rest.

@mruberry mruberry requested review from mruberry and removed request for apaszke September 22, 2020 07:00
@kshitij12345
Copy link
Collaborator Author

On CPU

With

    cpu_kernel(iter, [=](scalar_t a) -> scalar_t {
      return (
          at::_isnan(a)
              ? nan_replacement
              : (a == std::numeric_limits<scalar_t>::infinity()
                     ? pos_inf_replacement
                     : (a == -std::numeric_limits<scalar_t>::infinity()
                            ? neg_inf_replacement
                            : a)));
250 / 250
Average attemts per valid config: 24.1
 Best:                   Size    Shape             X order        Y order
--------------------------------------------------------------------------------
 4.6 ns / element       88236 |   57, 1548       | contiguous   | 
 4.6 ns / element       85680 |  272,  315       | [1 0]        | 
 4.6 ns / element      114070 | 3355,   34       | contiguous   | 
 4.6 ns / element      109872 |  763,  144       | contiguous   | 
 4.6 ns / element      129925 | 5197,   25       | contiguous   | 
 4.6 ns / element      114528 | 3579,   32       | contiguous   | 
 4.6 ns / element      103488 |   56,   88,   21 | contiguous   | 
 4.6 ns / element       97682 |  578,  169       | contiguous   | 
 4.6 ns / element      127932 |   21, 6092       | contiguous   | 
 4.6 ns / element       91840 |  328,  280       | contiguous   | 

Worst:                   Size    Shape             X order        Y order
--------------------------------------------------------------------------------
 4.6 ns / element       80325 |   25,  189,   17 | contiguous   | 
 4.6 ns / element       78645 |   49, 1605       | contiguous   | 
 4.6 ns / element       93024 | 1224,   76       | contiguous   | 
 4.6 ns / element       69608 |   88,  791       | contiguous   | 
 4.6 ns / element       83204 |   62, 1342       | [1 0]        | 
 4.7 ns / element       78336 |   72, 1088       | contiguous   | 
 4.7 ns / element      118020 | 5901,   20       | contiguous   | 
 4.7 ns / element      122960 |  106,   40,   29 | contiguous   | 
 4.7 ns / element       66848 | 4178,   16       | contiguous   | 
 4.9 ns / element       69108 | 1772,   39       | contiguous   | 

With

cpu_kernel(iter, [=](scalar_t a) -> scalar_t {
      return at::_isfinite(a)
          ? a
          : (at::_isnan(a) ? nan_replacement
                           : (a == std::numeric_limits<scalar_t>::infinity()
                                  ? pos_inf_replacement
                                  : neg_inf_replacement));
250 / 250
Average attemts per valid config: 24.1
 Best:                   Size    Shape             X order        Y order
--------------------------------------------------------------------------------
 6.3 ns / element       66066 |  462,  143       | contiguous   | 
 6.3 ns / element      103488 |   56,   88,   21 | contiguous   | 
 6.3 ns / element       69930 |  945,   74       | contiguous   | 
 6.3 ns / element       74316 |   44, 1689       | contiguous   | 
 6.3 ns / element      114528 | 3579,   32       | contiguous   | 
 6.3 ns / element       88236 |   57, 1548       | contiguous   | 
 6.3 ns / element      109872 |  763,  144       | contiguous   | 
 6.3 ns / element       66880 |  190,  352       | contiguous   | 
 6.3 ns / element      114730 |   35, 3278       | contiguous   | 
 6.3 ns / element       76342 |   41, 1862       | contiguous   | 

Worst:                   Size    Shape             X order        Y order
--------------------------------------------------------------------------------
 6.4 ns / element       78336 |   72, 1088       | contiguous   | 
 6.4 ns / element      115080 | 5754,   20       | contiguous   | 
 6.4 ns / element       80080 | 5005,   16       | contiguous   | 
 6.5 ns / element       79968 |  392,  204       | contiguous   | 
 6.5 ns / element       97482 |  422,  231       | contiguous   | 
 6.5 ns / element      127932 |   21, 6092       | contiguous   | 
 8.0 ns / element      104348 | 1373,   76       | [1 0]        | 
 8.1 ns / element       80166 |  186,  431       | [1 0]        | 
 8.3 ns / element       89870 |  430,  209       | contiguous   | 
 8.3 ns / element      102752 |  247,   16,   26 | contiguous   | 

benchmark.txt

@kshitij12345
Copy link
Collaborator Author

@mruberry PTAL :)

Have addressed the comments as well as updated the kernel.

Errors are not related to this PR. Will rebase after addressing new comments if any.

Thanks!

for x in generate_numeric_tensors(dtype=dtype, device=device):
if not contiguous:
x = x.T
self.compare_with_numpy(torch.nan_to_num, np.nan_to_num, x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this and the next comparison be covered by the existing unary ufunc tests?

That is, I'm wondering if only the tests where the extra args are set are needed in this test?

Copy link
Collaborator Author

@kshitij12345 kshitij12345 Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about method variant but torch.nan_to_num is verified. You know where the method variants are tested?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variants are tested for consistency here:

def test_variant_consistency(self, device, dtype, op):

Which is not the most thorough test, but we didn't want to run the tests again for method variants because it would greatly increase test time (and cost).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Will remove those two then. Thanks!

@dtypes(*(torch.testing.get_all_int_dtypes() + torch.testing.get_all_fp_dtypes(include_bfloat16=False)))
def test_nan_to_num(self, device, dtype):
for contiguous in [False, True]:
for x in generate_numeric_tensors(dtype=dtype, device=device):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this test should use a different generator with more non-finite values, since the numeric tensors will be tested elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine as the generator has tensors with extremals, otherwise we'll will be creating almost similar function which does the same thing (takes dtype and device) and generates tensor. Let me know what you feel about that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just call make_tensor:

def make_tensor(size, device: torch.device, dtype: torch.dtype, *,

And add the extremal values if it's float. You could create a smaller 64x64 tensor, for example. generate_numeric_tensors will create a very large tensor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Sure. Thanks!

@mruberry
Copy link
Collaborator

Thanks for the benchmark. PR continues to look good, of course. I just have a couple questions about refining the test in test_unary_ufunc.py to ensure it's not redundant with the other OpInfo tests. Maybe just testing some smaller tensors with more non-finite values would be best?

@kshitij12345
Copy link
Collaborator Author

Gentle Ping :)

@mruberry mruberry self-requested a review September 29, 2020 16:40
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Exciting to have torch.nan_to_num!

@kshitij12345
Copy link
Collaborator Author

Gentle Ping :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

mruberry commented Oct 2, 2020

Gentle Ping :)

Thanks for the ping, and sorry for the delay. It's been a very busy week with the branch cut. Things should get back to normal soon.

@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

Congrats @kshitij12345! This is one of, if not the first, new function to be added to PyTorch 1.8!

@kshitij12345 kshitij12345 deleted the develop/numpy/nan_to_num branch October 5, 2020 09:09
@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in f65ab89.

facebook-github-bot pushed a commit that referenced this pull request Oct 16, 2020
Summary:
Related to #44592

This PR is to fix the deprecated warnings for the nan_to_num function.

Below is the warning message when building the latest code.
```
../aten/src/ATen/native/UnaryOps.cpp: In function ‘at::Tensor& at::native::nan_to_num_out(at::Tensor&,
const at::Tensor&, c10::optional<double>, c10::optional<double>, c10::optional<double>)’:
../aten/src/ATen/native/UnaryOps.cpp:397:45: warning: ‘bool c10::isIntegralType(c10::ScalarType)’
is deprecated: isIntegralType is deprecated.
Please use the overload with 'includeBool' parameter instead. [-Wdeprecated-declarations]
   if (c10::isIntegralType(self.scalar_type())) {
```

The deprecated warning is defined in `ScalarType.h`.
https://github.com/pytorch/pytorch/blob/d790ec6de01a61fe81733c41a64b6092bacfb7bd/c10/core/ScalarType.h#L255-L260

Pull Request resolved: #46309

Reviewed By: mrshenli

Differential Revision: D24310248

Pulled By: heitorschueroff

fbshipit-source-id: 0f9f2ad304eb5a2da9d2b415343f2fc9029037af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants