You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very nice work with AutoFP8 ! We were thinking of integrating AutoFP8 in transformers, so that users can run your checkpoints directly with transformers ! We would simply replace the linear layers by its quantized version. Hence, we would only support the inference. Let us know if you agree with this ! The goal would be to explose the quantized linear layer class in this repo (I see that you have several quantized linear) and import it in transformers.
I will be leading the integration, so any help is appreciated ! Also, are there any big blockers that I might not have seen ?
Thanks in advance !
The text was updated successfully, but these errors were encountered:
Hey @SunMarc - we are planning to push most of our development into llm-compressor and compressed-tensors which are the successors to this mini-repo that we are already working on integrating it into transformers (huggingface/transformers#31704)
This supports:
mixed precision w4a16 / w8a16
w8a8 int8 (activation quantization)
w8a8 fp8 (float point quantization)
We also support the following algorithms which can be applied to both fp8 and int8 and int4 models:
ptq
gptq
smoothquant
sparsegpt
We would prefer to put efforts related to transformers behind this framework (including doing a surge on fp8 and int8 compute with our cutlass kernels that we use in vllm)
Hi neuralmagic team !
Very nice work with AutoFP8 ! We were thinking of integrating AutoFP8 in transformers, so that users can run your checkpoints directly with transformers ! We would simply replace the linear layers by its quantized version. Hence, we would only support the inference. Let us know if you agree with this ! The goal would be to explose the quantized linear layer class in this repo (I see that you have several quantized linear) and import it in transformers.
I will be leading the integration, so any help is appreciated ! Also, are there any big blockers that I might not have seen ?
Thanks in advance !
The text was updated successfully, but these errors were encountered: