Skip to content

NXP backend: Create NeutronAtenPassManager with initial BatchNorm fusing passes #10579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

skywall
Copy link
Collaborator

@skywall skywall commented Apr 30, 2025

Summary

Add NeutronAtenPassManager that by default runs passes for optimal model conversion. Initially, only passes for BatchNorm + Conv/Linear fusing are added. I haven't intentionally moved passes into backend/transforms because added passes are executed on Aten ops and not Edge ops. Please let me know if I should follow different approach.

This pass manager is planned to be included in Neutron backend's pipeline - https://github.com/pytorch/executorch/pull/10196/files#diff-b3b0844670da322bb4c245ae440e8c71378d631bb94334148322cdf758369baaR51

cc @digantdesai @JakeStevens @robert-kalmar

Copy link

pytorch-bot bot commented Apr 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10579

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 21ed973 with merge base 7d9dd46 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2025
@skywall
Copy link
Collaborator Author

skywall commented Apr 30, 2025

@pytorchbot label "module: nxp" "release notes: nxp"

@pytorch-bot pytorch-bot bot added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels Apr 30, 2025

bn_node = node

if not _is_linear(bn_node.args[0]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and above, you can be defensive and also make sure args len == 1. there is nothing stopping someone from adding a skip connection after conv/linear and before batchnorm, or some other op. Can't say i've seen it in practice though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for number of users of Conv/Linear op.


import torch

from executorch.backends.nxp.aten_passes.fuse_batch_norm_with_conv_pass import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can become unwieldly since there are a few other permutations of "batch norm" fusion, maybe just one file for fuse_batch_norm would be good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you. Maybe we can create subdirectories in backend/transforms based on what dialect/ir is pass targeting?

@JakeStevens
Copy link
Contributor

LGTM but @digantdesai thoughts on the aten pass setup?

@skywall skywall changed the title [Neutron] Create NeutronAtenPassManager with initial BatchNorm fusing passes NXP backend: Create NeutronAtenPassManager with initial BatchNorm fusing passes May 2, 2025
@skywall skywall force-pushed the feature/nxf93343/neutron-batch-norm-fusion-passes branch from d1913ae to 5d69743 Compare May 2, 2025 12:06
@skywall
Copy link
Collaborator Author

skywall commented May 2, 2025

I have also disabled some tests that require changes from #10258 . We'll update PR that will be merged last.

program = torch.export.export_for_training(module, example_input, strict=True)
og_module = program.module()

pm = NeutronAtenPassManager()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why does it have to be before export? Typically this is done in neutron.preprocess(). The reason is, it is not easy to run these nodes if Neutron partitioner fails to pick them up. There are several examples of this in Arm, XNNPACK etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What export do you have in mind here? Export to lowered program?

These passes are executed before quantization on graph with nodes in ATen IR. If we try to run them later, pass will be much more complex because Quantize/Dequantize nodes will be already present in the graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not export_for_training but the export after quantization.

And yeah typically backend specific graph modifications are done in preprocess, unless done specifically for quantization purposes.

The rationale is, you can compose partitioners/backends together and not have to worry about someone else messing with the graph or you failing to partition your modified graph. For example Neutron + Cortex_m delegates can be composed by the user.

See other delegates doing bn fusion - https://github.com/search?q=repo%3Apytorch%2Fexecutorch+FuseBatchNorm&type=code

I guess you are following what Cadence backend is doing, the main difference between Neutron and Cadence is that Cadence is not a delegate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BN + Conv fusion is done as Neutron does not (yet) have kernel for BN, but typically the BN is used with Convolution or Matrix multiplication, where it can be fused to. And because of the quantization we do the fusion on ATen graph level, so that the NeutronQuantizer already quantized with respect to the fused batchnorm.

This reminds me about our discussion of quantizing the Conv/Matmul + BN+ (Relu) Activation together as one "cluster" (i.e. not apply observers on Conv and Relu separatelly). This would enable to move the batchnorm fusion even on the Edge dialect transformation in the NeutronBackend.
@skywall , can you please add a such a remark to the passes to address it with the quantizer update? And as suggested by @digantdesai, is is possible to move the transformation to the NeutronQuantizer.transform_for_annotation method as immediate resolution?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appart from that, the rationale of backends composition makes completelly sense, I am curious how the quantizers composition is expected to be used by the user. Let say when combining the Neutron & Cortex-m backend the user will apply the backends per performance gain (that is e.g. Neutron first, cortex_m second) and apply the corresponding quantizers in the same order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ok for now, please create an issue, and put it in the code as a TODO. The rationale is if you fuse before and for some reason can't lower to neutron, there is no core-aten op in the edge-ir as "conv+bn+relu" so it will fail as missing op vs. conv, bn, and relu individually will run OK.

conv_args.append(None)

weight_attr_name = conv_weight_node.target
_assign_attr(
Copy link
Contributor

@digantdesai digantdesai May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this without using internal methods, and insert new params and delete old params https://github.com/pytorch/executorch/blob/main/backends/transforms/utils.py#L65 (assuming you are doing this after export)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require to propagate exported program (via constructor?) to the pass. I don't find that particularly clean because passes are built upon GraphModules as you've mentioned in some of the previous PR's. We will end up with ExportedProgram being passed via constructor and GraphModule within call() function what doesn't sound right to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess this is for when you do run this pass in delegate.preprocess

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, if we can move this either in NeutronQuantizer.transform_for_annotation (unless you are waiting for the other PR) with or move to Neutron.Preprocess it will be ready to go.

@skywall skywall force-pushed the feature/nxf93343/neutron-batch-norm-fusion-passes branch from 5d69743 to 270b8af Compare May 13, 2025 15:18
@skywall skywall force-pushed the feature/nxf93343/neutron-batch-norm-fusion-passes branch from 270b8af to 9f8f912 Compare May 16, 2025 13:11
@skywall
Copy link
Collaborator Author

skywall commented May 16, 2025

Moved NeutronAtenPassManager call into NeutronQuantizer.transform_for_annotation.

@skywall skywall force-pushed the feature/nxf93343/neutron-batch-norm-fusion-passes branch from 9f8f912 to 21ed973 Compare May 19, 2025 06:19
@skywall
Copy link
Collaborator Author

skywall commented May 20, 2025

@digantdesai I just realized, BN + Conv fusion is already happening in prepare_pt2e function call so our pass is doing kinda duplicate work. I don't think it's good practice to introduce optimizations randomly into the pipeline (prepare_pt2e function call) and it seems to me like possible candidate for removal in the future. What is your opinion? Should we leave the pass also here or we can rely on PyTorch that fusion will stay there as is.

@digantdesai
Copy link
Contributor

digantdesai commented May 22, 2025

I don't think it's good practice to introduce optimizations randomly into the pipeline

100% agree, I don't remember the rationale ATM, can you please create an issue and tag @jerryzh168 and me?

@@ -202,4 +205,5 @@ def __init__(self):
def transform_for_annotation(
self, model: torch.fx.GraphModule
) -> torch.fx.GraphModule:
return model
pass_runner = NeutronAtenPassManager()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please create issues and leave it as TODOs. Thanks.

@digantdesai
Copy link
Contributor

Arm test seems like an infra issue.

@digantdesai digantdesai merged commit 360c9bb into pytorch:main May 22, 2025
88 of 89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants