PyTorch native quantization and sparsity for training and inference
training
sparsity
cuda
inference
optimizer
pytorch
transformer
offloading
llama
quantization
mx
brrr
dtypes
float8
-
Updated
Nov 12, 2024 - Python