Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the quantizer to use aten ops directly #4195

Closed
wants to merge 1 commit into from

Commits on Jul 15, 2024

  1. Migrate the quantizer to use aten ops directly (#4195)

    Summary:
    Pull Request resolved: #4195
    
    This major change allows a lot more flexibility in the quantizer, and reduces the dependency on the decompositions/graph tracing tools.
    
    The motivation is that some of those do not preserve or propagate `source_fn_stack` information, resulting in quantization misses. SDPA is an example, where the underlying `bmm` ops cannot be quantized with `source_fn_stack` information alone, or MHA, which can hide its SDPA component and sometimes even `linear` ops depending on the model (see ViT for an example).
    
    Also note than in most cases, we match single nodes anyway, with a 1-1 mapping between the op (either nn.Module or nn.functional) and the aten op, so using the aten op directly is simply easier.
    
    Summary of the changes:
    - change the quantizer to match aten ops directly, through `node.target`
    - propagate required changes to the `QuantFusion` pass
    - update/remove existing patterns
    
    Reviewed By: dulinriley
    
    Differential Revision: D59552606
    mcremon-meta authored and facebook-github-bot committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    6d1694d View commit details
    Browse the repository at this point in the history