TorchDynamo: Add convolution unary fusion for cpu in inference mode #87063

XiaobingSuper · 2022-10-17T06:54:23Z

Stack from ghstack (oldest at bottom):

cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @lezcano @fdrocha

[ghstack-poisoned]

pytorch-bot · 2022-10-17T06:54:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87063

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a8853c0:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ence mode" [ghstack-poisoned]

torch/_inductor/ir.py

jgong5 · 2022-10-18T11:34:14Z

torch/_inductor/overrides.py

+def fuse_conv_unary_eval(conv: nn.Module, unary: nn.Module):
+    assert not (conv.training), "Fusion only for eval!"


why marking it as "eval" and requiring it not for "training"? The fusion can apply to forward graph and backward graph generated by AOTAutograd as long as the pattern matches per fuse_fx, right?

For the current path, we do the fusion path before AOTAutograd, which doesn't have forward graph and backward graph. But if we do the fusion after AOTAutograd(at fw_compiler), the AOTAutograd will decompose some ops to many smaller ops, such as Gelu, it will be hard to do the fusion.

jgong5 · 2022-10-18T11:34:44Z

torch/_inductor/overrides.py

+                eval_mode = all(not n.training for n in [conv, unary])
+                if not eval_mode:
+                    continue


ditto, why eval only?

jgong5 · 2022-10-18T13:03:49Z

torch/_inductor/lowering.py

+    else:
+        log.warning(
+            "Register OneDNN fusion ops is failed which OneDNN is not enabled at build step"
+        )


I don't think we have to warn here since the fx graph won't contain the corresponding aten ops and users don't have to care about the absence of the registration.

Yes, changed.

…ence mode" [ghstack-poisoned]

…ence mode" cc jansel lezcano fdrocha mlazos soumith voznesenskym yanboliang [ghstack-poisoned]

pytorchmergebot · 2022-10-27T06:55:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions · 2022-10-27T06:56:08Z

Hey @XiaobingSuper.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@jansel

…ytorch#87063) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: pytorch#87063 Approved by: https://github.com/jgong5, https://github.com/jansel

…on for cpu in inference mode" An FX transformation is added to fuse ConvTranspose2d with eltwise OPs in torchinductor for CPU in inference mode, following the implementation in #87063. The fusion OP is implemented in #90264 and will be treated as an extern kernel call in torchinductor. The fusion of ConvTranspose2d with the below OPs is supported: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

…ference mode" An FX transformation is added to fuse ConvTranspose2d with eltwise OPs in torchinductor for CPU in inference mode, following the implementation in #87063. The fusion OP is implemented in #90264 and will be treated as an extern kernel call in torchinductor. The fusion of ConvTranspose2d with the below OPs is supported: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

@jansel