Migrate the quantizer to use aten ops directly #4195

mcremon-meta · 2024-07-10T01:23:10Z

Summary:
This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate source_fn_stack information, resulting in quantization misses. SDPA is an example, where the underlying bmm ops cannot be quantized with source_fn_stack information alone, or MHA, which can hide its SDPA component and sometimes even linear ops depending on the model (see ViT for an example).

Summary of the changes:

change the quantizer to match aten ops directly, through node.target
propagate required changes to the QuantFusion pass
update/remove existing patterns

Differential Revision: D59552606

pytorch-bot · 2024-07-10T01:23:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4195

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6d1694d with merge base fbe0af1 ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-07-10T01:23:19Z

This pull request was exported from Phabricator. Differential Revision: D59552606

facebook-github-bot · 2024-07-10T02:01:10Z

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary: Pull Request resolved: #4195 This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Differential Revision: D59552606

facebook-github-bot · 2024-07-10T17:53:13Z

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary: Pull Request resolved: #4195 This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Differential Revision: D59552606

facebook-github-bot · 2024-07-10T18:00:35Z

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary: Pull Request resolved: #4195 This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Differential Revision: D59552606

dulinriley · 2024-07-11T23:18:02Z

backends/cadence/aot/compiler.py

-    # pyre-fixme[16]: Pyre doesn't get that CadenceQuantizer has a patterns attribute
-    patterns = [q.pattern for q in quantizer.quantizers]
+    patterns = [
+        assert_is_instance(q, CadenceAtenQuantizer).pattern


Is this a duplicate and not properly stacked with
#4047?

You can use "gh-stack" to help with this in the future

dulinriley · 2024-07-11T23:18:36Z

backends/cadence/aot/quantizer/patterns.py

@@ -44,18 +47,20 @@ class PartitionAnchors:

 class QuantizationPattern(ABC):
    @abstractmethod
-    def partition_types(self):
+    def partition_types(self) -> list[OpOverload]:


I think we need to support Python 3.8 here, which doesn't support list[x], and you need to use typing.List[x]

Summary: This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier. Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Reviewed By: dulinriley Differential Revision: D59552606

facebook-github-bot · 2024-07-12T21:02:18Z

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary: This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier. Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Reviewed By: dulinriley Differential Revision: D59552606

facebook-github-bot · 2024-07-15T17:24:08Z

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary: Pull Request resolved: #4195 This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools. The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example). Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier. Summary of the changes: - change the quantizer to match aten ops directly, through `node.target` - propagate required changes to the `QuantFusion` pass - update/remove existing patterns Reviewed By: dulinriley Differential Revision: D59552606

facebook-github-bot · 2024-07-15T23:54:59Z

This pull request was exported from Phabricator. Differential Revision: D59552606

facebook-github-bot · 2024-07-16T01:30:49Z

This pull request has been merged in a22e809.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 10, 2024

facebook-github-bot added the fb-exported label Jul 10, 2024

mcremon-meta force-pushed the export-D59552606 branch from 0519368 to 22d7280 Compare July 10, 2024 02:01

mcremon-meta force-pushed the export-D59552606 branch from 22d7280 to 02f71f6 Compare July 10, 2024 17:53

mcremon-meta force-pushed the export-D59552606 branch from 02f71f6 to 4542695 Compare July 10, 2024 18:00

dulinriley approved these changes Jul 11, 2024

View reviewed changes

facebook-github-bot force-pushed the export-D59552606 branch from 4542695 to f680897 Compare July 12, 2024 21:02

facebook-github-bot force-pushed the export-D59552606 branch from f680897 to 1fd5271 Compare July 15, 2024 17:23

mcremon-meta force-pushed the export-D59552606 branch from 1fd5271 to 6d1694d Compare July 15, 2024 23:55

facebook-github-bot closed this in a22e809 Jul 16, 2024

facebook-github-bot added the Merged label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate the quantizer to use aten ops directly #4195

Migrate the quantizer to use aten ops directly #4195

Uh oh!

mcremon-meta commented Jul 10, 2024

Uh oh!

pytorch-bot bot commented Jul 10, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

dulinriley Jul 11, 2024

Uh oh!

dulinriley Jul 11, 2024

Uh oh!

facebook-github-bot commented Jul 12, 2024

Uh oh!

facebook-github-bot commented Jul 15, 2024

Uh oh!

facebook-github-bot commented Jul 15, 2024

Uh oh!

facebook-github-bot commented Jul 16, 2024

Uh oh!

Uh oh!

Migrate the quantizer to use aten ops directly #4195

Migrate the quantizer to use aten ops directly #4195

Uh oh!

Conversation

mcremon-meta commented Jul 10, 2024

Uh oh!

pytorch-bot bot commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4195

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

facebook-github-bot commented Jul 10, 2024

Uh oh!

dulinriley Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

dulinriley Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 12, 2024

Uh oh!

facebook-github-bot commented Jul 15, 2024

Uh oh!

facebook-github-bot commented Jul 15, 2024

Uh oh!

facebook-github-bot commented Jul 16, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2024 •

edited

Loading