Change the elementwise broadcasting contract from graph to kernel #3894
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Currently, there is a graph level pass to handle limited broadcasting of elementwise ops if the input tensors are not of the same size.
We move this responsibility down to the kernels with this diff, which is how ET and the portable ops do it. Ops of this kind are only
add
,sub
,mul
anddiv
for now, but there will be more.We retain the implementations for the reference kernels, because we want to avoid linking the portable ops directly, which takes forever at compile time. We can also use a much smaller set of types (basically only
float
).We can remove a hack in the RNNT Joiner with this change, and run it natively. It takes a huge hit in performance, which will be fixed by getting broadcast-friendly kernels from Cadence.
We finally remove the binop tests in
test_aten_ops.py
, which were also using strange types and had been on the chopping block for a while.Differential Revision: D58207691