Skip to content
This repository was archived by the owner on Aug 21, 2025. It is now read-only.

Conversation

@gprateek93
Copy link
Contributor

Signed-Off-By: Prateek Gupta prateek@nod-labs.com

@facebook-github-bot
Copy link

Hi @gprateek93!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch from 7d52c15 to 3c9af2a Compare February 22, 2022 17:47
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link
Contributor

@Chillee Chillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me! A couple minor notes, and I'd like to wait for the CI to pass first.

@Chillee
Copy link
Contributor

Chillee commented Feb 22, 2022

Tests look good, just fix the flake8 issues (and the other comments and we're good to go!)

@Chillee
Copy link
Contributor

Chillee commented Feb 22, 2022

Oh actually, I realized our decomposition tests are currently skipped due to an issue with upstream, I'll test it out locally.

Copy link
Contributor

@Chillee Chillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't seem like it passes testing.

To test your decompositions, comment out https://github.com/pytorch/functorch/blob/main/test/test_ops.py#L973 and run

pytest test/test_ops.py::TestDecompositionOpInfoCPU -k layer_norm

@gprateek93
Copy link
Contributor Author

Actually, it doesn't seem like it passes testing.

To test your decompositions, comment out https://github.com/pytorch/functorch/blob/main/test/test_ops.py#L973 and run

pytest test/test_ops.py::TestDecompositionOpInfoCPU -k layer_norm

I will check and make necessary changes. Thanks!

@gprateek93
Copy link
Contributor Author

@Chillee Even after removing my commit, the test is failing. I am assuming the error is due to the decomposition of aten.native_layer_norm.

@Chillee
Copy link
Contributor

Chillee commented Feb 23, 2022

@gprateek93 What's the error you see? There was a minor precision issue I just fixed.

The error I ran into with the decomposition though seemed more like a logical error

functorch/_src/decompositions.py:82: in native_layer_norm_backward
    b = aten.sum(grad_x_hat, inner_dims, True)
../pytorch/torch/_ops.py:142: in __call__
    return self._op(*args, **kwargs or {})
test/test_ops.py:1143: in __torch_dispatch__
    real_out = func(*tree_map(unwrap_tensor, args), **tree_map(unwrap_tensor, kwargs))
  IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch from 3c9af2a to f490792 Compare February 24, 2022 13:04
@gprateek93
Copy link
Contributor Author

gprateek93 commented Feb 24, 2022

@gprateek93 What's the error you see? There was a minor precision issue I just fixed.

The error I ran into with the decomposition though seemed more like a logical error

functorch/_src/decompositions.py:82: in native_layer_norm_backward
    b = aten.sum(grad_x_hat, inner_dims, True)
../pytorch/torch/_ops.py:142: in __call__
    return self._op(*args, **kwargs or {})
test/test_ops.py:1143: in __torch_dispatch__
    real_out = func(*tree_map(unwrap_tensor, args), **tree_map(unwrap_tensor, kwargs))
  IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

Hi @Chillee, On commenting the line you mentioned, I am not getting any error in the code. But I may have got the logical bug. I am not able to verify with test. Can you check once?

@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch 2 times, most recently from 8186ee0 to 49e5889 Compare February 24, 2022 17:36
@Chillee
Copy link
Contributor

Chillee commented Feb 24, 2022

@gprateek93 Why are you unable to run the tests? Like, what happens when you try running them? If the decomposition passes you should see something like

========================================================================== 4 passed, 3106 deselected, 13 warnings in 6.30s ===========================================================================

When I run the decompositions with the newest commit in the PR there are still some test failures

FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_bfloat16 - RuntimeError: Difference from float64 is larger with decomposition native_layer_nor...
FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_float32 - RuntimeError: native_layer_norm_backward decomposition failed, max rel: 2.2540149688...
FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_float64 - RuntimeError: native_layer_norm_backward decomposition failed, max rel: 1.1596385197.

@gprateek93
Copy link
Contributor Author

@gprateek93 Why are you unable to run the tests? Like, what happens when you try running them? If the decomposition passes you should see something like

========================================================================== 4 passed, 3106 deselected, 13 warnings in 6.30s ===========================================================================

When I run the decompositions with the newest commit in the PR there are still some test failures

FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_bfloat16 - RuntimeError: Difference from float64 is larger with decomposition native_layer_nor...
FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_float32 - RuntimeError: native_layer_norm_backward decomposition failed, max rel: 2.2540149688...
FAILED test/test_ops.py::TestDecompositionOpInfoCPU::test_decomposition_nn_functional_layer_norm_cpu_float64 - RuntimeError: native_layer_norm_backward decomposition failed, max rel: 1.1596385197.

So I am able to run the test, but I am getting all passed.
==================================== 3 passed, 3021 deselected, 7 warnings in 1.19s =====================================

The command I am running is : pytest test/test_ops.py::TestDecompositionOpInfoCPU -k layer_norm -v

@Chillee
Copy link
Contributor

Chillee commented Feb 24, 2022

@gprateek93 Are you sure the decompositions are being run? Can you try changing it to something obviously wrong and see if it fails?

I'm not sure how you installed functorch, but perhaps if you installed it with pip you might need to reinstall it before your python changes will get picked up.

Otherwise, to install it in "development" mode, you can do python setup.py develop.

@gprateek93
Copy link
Contributor Author

gprateek93 commented Feb 25, 2022

@Chillee I am able to get the failures now, But I am not sure why they are failing. I have tried to create one of the test cases below in google colab and ran it with both torch op and my decomposition, and the results are fairly equal.
sample_example

Could it be some kind of precision issue?

@Chillee
Copy link
Contributor

Chillee commented Feb 25, 2022

@gprateek93 For the original PR, no, since the absolute value diff on that is > 2.

For the updated version in your gist, one of the errors might be an unavoidable precision issue. The other one seems to be an actual failure though.

I updated the main branch to output the failing arguments in an error.

@gprateek93
Copy link
Contributor Author

@gprateek93 For the original PR, no, since the absolute value diff on that is > 2.

For the updated version in your gist, one of the errors might be an unavoidable precision issue. The other one seems to be an actual failure though.

I updated the main branch to output the failing arguments in an error.

Thank you for the updated test support. I can now view which argument is failing, I am working on this.

@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch 4 times, most recently from fb5480c to dc1e517 Compare March 4, 2022 11:07
@gprateek93
Copy link
Contributor Author

@Chillee apart from the precision issue, other test cases get passed in this current revision.

@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch from dc1e517 to 32b7fc6 Compare March 10, 2022 10:02
@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch 2 times, most recently from a4f6585 to 4514786 Compare March 11, 2022 13:10
@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch 2 times, most recently from 2b33998 to 01ac08a Compare March 15, 2022 16:07
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
@gprateek93 gprateek93 force-pushed the prateek/native-layer-norm-backward branch from 01ac08a to ff0ae21 Compare March 17, 2022 16:11
@Chillee Chillee merged commit 481fada into pytorch:main Mar 30, 2022
zou3519 pushed a commit to zou3519/pytorch that referenced this pull request Jul 20, 2022
bigfootjon pushed a commit to pytorch/pytorch that referenced this pull request Jul 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants