Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the elementwise broadcasting contract from graph to kernel #3894

Closed
wants to merge 1 commit into from

Conversation

mcremon-meta
Copy link
Contributor

Summary:
Currently, there is a graph level pass to handle limited broadcasting of elementwise ops if the input tensors are not of the same size.

We move this responsibility down to the kernels with this diff, which is how ET and the portable ops do it. Ops of this kind are only add, sub, mul and div for now, but there will be more.

We retain the implementations for the reference kernels, because we want to avoid linking the portable ops directly, which takes forever at compile time. We can also use a much smaller set of types (basically only float).

We can remove a hack in the RNNT Joiner with this change, and run it natively. It takes a huge hit in performance, which will be fixed by getting broadcast-friendly kernels from Cadence.

We finally remove the binop tests in test_aten_ops.py, which were also using strange types and had been on the chopping block for a while.

Differential Revision: D58207691

Summary:
Currently, there is a graph level pass to handle limited broadcasting of elementwise ops if the input tensors are not of the same size.

We move this responsibility down to the kernels with this diff, which is how ET and the portable ops do it. Ops of this kind are only `add`, `sub`, `mul` and `div` for now, but there will be more.

We retain the implementations for the reference kernels, because we want to avoid linking the portable ops directly, which takes forever at compile time. We can also use a much smaller set of types (basically only `float`).

We can remove a hack in the RNNT Joiner with this change, and run it natively. It takes a huge hit in performance, which will be fixed by getting broadcast-friendly kernels from Cadence.

We finally remove the binop tests in `test_aten_ops.py`, which were also using strange types and had been on the chopping block for a while.

Differential Revision: D58207691
Copy link

pytorch-bot bot commented Jun 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3894

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d52c39f with merge base 6554fa5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 7, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58207691

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 88e9737.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants