Closed
Description
This issue tracks outstanding issues for a torchao 0.1 release
-
New Functionality
- Test compatibility with PyTorch 2.2 and 2.3rc1 (@cpuhrsch)
- Fix tests marked as flaky (@cpuhrsch)
- int4, int8 weight only quantization support (only need one of the paths to work)
- path 1: int4, int8 weight quantization subclass API works with TorchTune (@jerryzh168), blocked by tensor subclass save load
- path 2: int4, int8 weight quantization module swap API works with TorchTune (@jerryzh168), WIP
- Add GPTQuantizer workflow for 4-bit weight quantization (W4A16) for GPU that works for gpt-fast (and executorch) (@jerryzh168, @HDCharles)
- remove lm-eval from GPTQ code (@HDCharles)
- Only one of the following need to happen
- change torchtune code to be compatible with current implementation of GPTQ, specifically change this: https://github.com/pytorch/torchtune/blob/main/torchtune/modules/kv_cache.py#L61-L62 to use index_put_ op (note this is not needed since we don't need to turn on cache during GPTQ)
- refactor GPTQ to use tensor subclass to remove dependency on export
- Add workflow for 4-bit weight, 8-bit activation quantization (W4A8) with/without GPTQ for executorch (@jerryzh168)
- without GPTQ path is working, still verifying the GPTQ path
- NF4 Dtype that works for QLoRA in TorchTune (@cpuhrsch)
- Fix API so it works with LoRACompatibleLinear
- Allow apply_quant_api()
- it currently looks for the children of the module and so doesn't do anything
-
Tutorials/BE
- Using/Writing a quantization technique using torchao (@jerryzh168)
- Using kernels written in torchao with PyTorch
- Replace Int8WeightOnlyQuantizedLinearWeight and Int8DynamicallyQuantizedLinearWeight with a single class
- Reconsider using class method for Int8DynamicallyQuantizedLinearWeight.from_float
- Remove / guard catch all forward args, kwargs for module swap API
- Land Tutorial Adding tutorial for gpu quantization using torchao tutorials#2730
-
If time permits (or v0.2)
- Enable test_8da4w_quantize for 2.4 @jerryzh168
- 4-bit quantization CPU perf numbers
- Feature parity between module swap api and subclass api
- Align smoothquant api with others
-
- Add high level auto quant API for int8 dynamic and weight-only quantization with benchmarks (@HDCharles)
Metadata
Metadata
Assignees
Labels
No labels