Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torchchat quantizer #897

Merged
merged 1 commit into from
Sep 25, 2024

Commits on Sep 25, 2024

  1. Add torchchat quantizer (pytorch#897)

    Summary:
    Pull Request resolved: pytorch#897
    
    This diff adds a quantizer for the new torchao kernels that is similar to the Int8DynActInt4WeightQuantizer quantizer in torchchat (imported from from torchao.quantization.quant_api).  See the draft torchchat PR (pytorch/torchchat#1070) for how this can integrate with torchchat's quantization API.
    
    I confirmed that models quantized with this are compatible with eager, compile, AOTI, and export to ExecuTorch in torchchat.  They do not run on ExecuTorch because we still have not written an ExecuTorch kernel wrapper.
    
    jerryzh168 this does not use the new subclass API, and this is something I'd like to discuss further with you.  I'll set up a sync with you this week, but I wanted to have some API on the table to ground the discussion.
    
    We do not currently have the required C++ methods implemented to support the new subclass API (e.g., we cannot unpack the packed weights from python; they are instead unpacked inline in the kernel).  From a torchchat user's perspective, I do not think this is important, but I'd like to discuss further.
    
    Differential Revision: D62394341
    metascroy authored and facebook-github-bot committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    bdd1486 View commit details
    Browse the repository at this point in the history