You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New HIGGS quantization interfaces, JIT kernel compilation support. (huggingface#36148)
* new flute
* new higgs working
* small adjustments
* progress and quallity
* small updates
* style
---------
Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
f"Device capability {target_device_cc} not supported for FLUTE (yet?) to verify your device capability check out https://developer.nvidia.com/cuda-gpus"
44
-
)
45
-
46
-
47
34
classHiggsHfQuantizer(HfQuantizer):
48
35
"""
49
36
Quantizer of the HIGGS method. Enables the loading of prequantized models and in-flight quantization of full-precision models.
Copy file name to clipboardExpand all lines: src/transformers/utils/quantization_config.py
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1404,6 +1404,8 @@ class HiggsConfig(QuantizationConfigMixin):
1404
1404
Hadamard size for the HIGGS method. Default is 512. Input dimension of matrices is padded to this value. Decreasing this below 512 will reduce the quality of the quantization.
1405
1405
group_size (int, *optional*, defaults to 256):
1406
1406
Group size for the HIGGS method. Can be 64, 128 or 256. Decreasing it barely affects the performance. Default is 256. Must be a divisor of hadamard_size.
1407
+
tune_metadata ('dict', *optional*, defaults to {}):
1408
+
Module-wise metadata (gemm block shapes, GPU metadata, etc.) for saving the kernel tuning results. Default is an empty dictionary. Is set automatically during tuning.
0 commit comments