Highlights

We’re excited to announce the release of TorchAO v0.1.0! TorchAO is a repository to host architecture optimization techniques such as quantization and sparsity and performance kernels on different backends such as CUDA and CPU. In this release, we added support for a few quantization techniques like int4 weight only GPTQ quantization, added nf4 dtype support for QLoRA and sparsity features like WandaSparsifier, we also added autotuner that can tune triton integer matrix multiplication kernels on cuda.

Note: TorchAO is currently in a pre-release state and under extensive development. The public APIs should not be considered stable. But we welcome you to try out our APIs and offerings and provide any feedback on your experience.

torchao 0.1.0 will be compatible with PyTorch 2.2.2 and 2.3.0, ExecuTorch 0.2.0 and TorchTune 0.1.0.

New Features

Quantization

Added tensor subclass based quantization APIs: change_linear_weights_to_int8_dqtensors, change_linear_weights_to_int8_woqtensors and change_linear_weights_to_int4_woqtensors (#1)
Added module based quantization APIs for int8 dynamic and weight only quantization apply_weight_only_int8_quant and apply_dynamic_quant (#1)
Added module swap version of int4 weight only quantization Int4WeightOnlyQuantizer and Int4WeightOnlyGPTQQuantizer used in TorchTune (#119, #116)
Added int8 dynamic activation and int4 weight quantization Int8DynActInt4WeightQuantizer and Int8DynActInt4WeightGPTQQuantizer, used in ExecuTorch (#74) (available after torch 2.3.0 and later)

Sparsity

Added WandaSparsifier that prunes both weights and activations (#22)

Kernels

Added autotuner for int mm Triton kernels (#41)

dtypes

nf4 tensor subclass and nf4 linear (#37, #40, #62)
Added uint4 dtype tensor subclass (#13)

Improvements

Setup github workflow for regression testing (#50)
Setup github workflow for torchao-nightly release (#54)

Documentation

Added tutorials for quantizing vision transformer model (#60)
Added tutorials for how to add an op for nf4 tensor (#54)

Notes

we are still debugging the accuracy problem for Int8DynActInt4WeightGPTQQuantizer
Save and load does not work well for tensor subclass based APIs yet
We will consolidate tensor subclass and module swap based quantization APIs later
uint4 tensor subclass is going to be merged into pytorch core in the future
Quantization ops in quant_primitives.py will be deduplicated with similar quantize/dequantize ops in PyTorch later

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchAO 0.1.0: First Release