-
Notifications
You must be signed in to change notification settings - Fork 322
Open
Description
We have existing HF integrations today but they're hidden in various places:
- https://docs.pytorch.org/ao/main/serving.html
- https://huggingface.co/pytorch/Phi-4-mini-instruct-INT8-INT4
Instead, we should have dedicated pages about this integration just like our vLLM page and in our partner repos:
- https://docs.pytorch.org/ao/main/torchao_vllm_integration.html
- https://huggingface.co/docs/transformers/en/quantization/torchao
- https://huggingface.co/docs/diffusers/en/quantization/torchao
- https://docs.vllm.ai/en/latest/features/quantization/torchao.html
It should highlight:
- Example integration code snippets with HF transformers and diffusers
- Some results so far, e.g. https://github.com/sayakpaul/diffusers-torchao/blob/main/README.md (Note: diffusers-torchao is outdated, the real integration lives in diffusers now)
- The latest quantization configs we support (e.g.
Float8DynamicActivationInt4WeightConfig
) - Link to serving.html for full end-to-end tutorial including inference with vLLM
- Link to torchao pages on transformers and diffusers docs
Thanks for the suggestion @sayakpaul
ArthurZuckersayakpaul, ArthurZucker and stevhliu
Metadata
Metadata
Assignees
Labels
No labels