-
Notifications
You must be signed in to change notification settings - Fork 603
NXP backend: Create NeutronAtenPassManager with initial BatchNorm fusing passes #10579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NXP backend: Create NeutronAtenPassManager with initial BatchNorm fusing passes #10579
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10579
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 21ed973 with merge base 7d9dd46 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "module: nxp" "release notes: nxp" |
|
||
bn_node = node | ||
|
||
if not _is_linear(bn_node.args[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and above, you can be defensive and also make sure args len == 1. there is nothing stopping someone from adding a skip connection after conv/linear and before batchnorm, or some other op. Can't say i've seen it in practice though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for number of users of Conv/Linear op.
|
||
import torch | ||
|
||
from executorch.backends.nxp.aten_passes.fuse_batch_norm_with_conv_pass import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can become unwieldly since there are a few other permutations of "batch norm" fusion, maybe just one file for fuse_batch_norm would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you. Maybe we can create subdirectories in backend/transforms
based on what dialect/ir is pass targeting?
LGTM but @digantdesai thoughts on the aten pass setup? |
d1913ae
to
5d69743
Compare
I have also disabled some tests that require changes from #10258 . We'll update PR that will be merged last. |
program = torch.export.export_for_training(module, example_input, strict=True) | ||
og_module = program.module() | ||
|
||
pm = NeutronAtenPassManager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why does it have to be before export
? Typically this is done in neutron.preprocess(). The reason is, it is not easy to run these nodes if Neutron partitioner fails to pick them up. There are several examples of this in Arm, XNNPACK etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What export
do you have in mind here? Export to lowered program?
These passes are executed before quantization on graph with nodes in ATen IR. If we try to run them later, pass will be much more complex because Quantize/Dequantize nodes will be already present in the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not export_for_training but the export after quantization.
And yeah typically backend specific graph modifications are done in preprocess
, unless done specifically for quantization purposes.
The rationale is, you can compose partitioners/backends together and not have to worry about someone else messing with the graph or you failing to partition your modified graph. For example Neutron + Cortex_m delegates can be composed by the user.
See other delegates doing bn fusion - https://github.com/search?q=repo%3Apytorch%2Fexecutorch+FuseBatchNorm&type=code
I guess you are following what Cadence backend is doing, the main difference between Neutron and Cadence is that Cadence is not a delegate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BN
+ Conv
fusion is done as Neutron does not (yet) have kernel for BN, but typically the BN is used with Convolution or Matrix multiplication, where it can be fused to. And because of the quantization we do the fusion on ATen graph level, so that the NeutronQuantizer
already quantized with respect to the fused batchnorm.
This reminds me about our discussion of quantizing the Conv/Matmul + BN+ (Relu) Activation together as one "cluster" (i.e. not apply observers on Conv and Relu separatelly). This would enable to move the batchnorm fusion even on the Edge dialect transformation in the NeutronBackend.
@skywall , can you please add a such a remark to the passes to address it with the quantizer update? And as suggested by @digantdesai, is is possible to move the transformation to the NeutronQuantizer.transform_for_annotation
method as immediate resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appart from that, the rationale of backends composition makes completelly sense, I am curious how the quantizers composition is expected to be used by the user. Let say when combining the Neutron & Cortex-m backend the user will apply the backends per performance gain (that is e.g. Neutron first, cortex_m second) and apply the corresponding quantizers in the same order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ok for now, please create an issue, and put it in the code as a TODO. The rationale is if you fuse before and for some reason can't lower to neutron, there is no core-aten op in the edge-ir as "conv+bn+relu" so it will fail as missing op vs. conv, bn, and relu individually will run OK.
conv_args.append(None) | ||
|
||
weight_attr_name = conv_weight_node.target | ||
_assign_attr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do this without using internal methods, and insert new params and delete old params https://github.com/pytorch/executorch/blob/main/backends/transforms/utils.py#L65 (assuming you are doing this after export)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require to propagate exported program (via constructor?) to the pass. I don't find that particularly clean because passes are built upon GraphModules as you've mentioned in some of the previous PR's. We will end up with ExportedProgram
being passed via constructor and GraphModule
within call()
function what doesn't sound right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess this is for when you do run this pass in delegate.preprocess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, if we can move this either in NeutronQuantizer.transform_for_annotation
(unless you are waiting for the other PR) with or move to Neutron.Preprocess
it will be ready to go.
5d69743
to
270b8af
Compare
270b8af
to
9f8f912
Compare
Moved |
9f8f912
to
21ed973
Compare
@digantdesai I just realized, BN + Conv fusion is already happening in |
100% agree, I don't remember the rationale ATM, can you please create an issue and tag @jerryzh168 and me? |
@@ -202,4 +205,5 @@ def __init__(self): | |||
def transform_for_annotation( | |||
self, model: torch.fx.GraphModule | |||
) -> torch.fx.GraphModule: | |||
return model | |||
pass_runner = NeutronAtenPassManager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please create issues and leave it as TODOs. Thanks.
Arm test seems like an infra issue. |
Summary
Add NeutronAtenPassManager that by default runs passes for optimal model conversion. Initially, only passes for BatchNorm + Conv/Linear fusing are added. I haven't intentionally moved passes into backend/transforms because added passes are executed on Aten ops and not Edge ops. Please let me know if I should follow different approach.
This pass manager is planned to be included in Neutron backend's pipeline - https://github.com/pytorch/executorch/pull/10196/files#diff-b3b0844670da322bb4c245ae440e8c71378d631bb94334148322cdf758369baaR51
cc @digantdesai @JakeStevens @robert-kalmar