Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the quantizer to use aten ops directly #4195

Closed
wants to merge 1 commit into from

Conversation

mcremon-meta
Copy link
Contributor

Summary:
This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate source_fn_stack information, resulting in quantization misses. SDPA is an example, where the underlying bmm ops cannot be quantized with source_fn_stack information alone, or MHA, which can hide its SDPA component and sometimes even linear ops depending on the model (see ViT for an example).

Summary of the changes:

  • change the quantizer to match aten ops directly, through node.target
  • propagate required changes to the QuantFusion pass
  • update/remove existing patterns

Differential Revision: D59552606

Copy link

pytorch-bot bot commented Jul 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4195

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6d1694d with merge base fbe0af1 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 10, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

mcremon-meta added a commit that referenced this pull request Jul 10, 2024
Summary:
Pull Request resolved: #4195

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Differential Revision: D59552606
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

mcremon-meta added a commit that referenced this pull request Jul 10, 2024
Summary:
Pull Request resolved: #4195

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Differential Revision: D59552606
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

mcremon-meta added a commit that referenced this pull request Jul 10, 2024
Summary:
Pull Request resolved: #4195

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Differential Revision: D59552606
# pyre-fixme[16]: Pyre doesn't get that CadenceQuantizer has a patterns attribute
patterns = [q.pattern for q in quantizer.quantizers]
patterns = [
assert_is_instance(q, CadenceAtenQuantizer).pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a duplicate and not properly stacked with
#4047?

You can use "gh-stack" to help with this in the future

@@ -44,18 +47,20 @@ class PartitionAnchors:

class QuantizationPattern(ABC):
@abstractmethod
def partition_types(self):
def partition_types(self) -> list[OpOverload]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to support Python 3.8 here, which doesn't support list[x], and you need to use typing.List[x]

facebook-github-bot pushed a commit that referenced this pull request Jul 12, 2024
Summary:

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier.

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Reviewed By: dulinriley

Differential Revision: D59552606
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

facebook-github-bot pushed a commit that referenced this pull request Jul 15, 2024
Summary:

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier.

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Reviewed By: dulinriley

Differential Revision: D59552606
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

Summary:
Pull Request resolved: #4195

This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.

The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).

Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier.

Summary of the changes:
- change the quantizer to match aten ops directly, through `node.target`
- propagate required changes to the `QuantFusion` pass
- update/remove existing patterns

Reviewed By: dulinriley

Differential Revision: D59552606
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59552606

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a22e809.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants