Closed
Description
New Features
- Auto Quant + Auto Tuner @HDCharles @cpuhrsch
- Auto quant API with support for int8 weight-only and dynamic quant
- [DELAYED] Benchmarks for auto quant on pytorch benchmark's inference quant pane
- Kernel auto tuner that works with auto quant API with results on SAM and benchmarks on torchbench
- C++ CPU extensions in torchao/kernel/cpu @msaroufim https://github.com/pytorch/ao/tree/main/torchao/csrc
- C++ CUDA extensions in torchao/kernel/cuda w/ A10G and manylinux support in CI @msaroufim Add A10G support in CI #176
- Int8 + 2:4 sparse inference APIs and results on SAM @jcaip
- Fast sparse training @jcaip
- Explore adding HQQ 4/3/2-bit quant to torchao @HDCharles @mobicham
- [WIP] Consolidating workflows to use tensor subclass @jerryzh168
- NF4 FSDP support [FSDP2][NF4Tensor][2/n] implement torch.chunk and other ops #150 @weifengpy
Better Engineering
- Remove _is_gpt_fast flag @jerryzh168
- Add rudimentary shape checks (e.g. multiple of 16 constraints)
- Generic test guards for sm80 (bfloat16) and CPU-only environment
- [done for 0.2] Dedup https://github.com/pytorch-labs/ao/blob/046dc985de6d5eac05c6575cc71505687e3aadf1/torchao/quantization/quant_primitives.py#L23-L42 @jerryzh168
- Better sparsity docs @jcaip
- Documentation of torchao features @msaroufim
- GPTQ Refactor
Repo Health
- Setup docs page pytorch.org/ao @svekars
- OSS -> fbcode sync (setup difftrain) @jerryzh168 and Jon from OSS team
- Don't cause a "import error" when someone is using a feature unsupported by e.g. torch 2.1.2 (e.g. https://github.com/pytorch-labs/ao/blob/046dc985de6d5eac05c6575cc71505687e3aadf1/torchao/quantization/quant_primitives.py#L42 will cause an import error if someone tries to use torchao.quantization.quant_primitives.per_token_dynamic_quant on 2.2.2
- Enable test_8da4w_quantize for 2.4 @cpuhrsch