You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
quant_method (`str`, *optional*, defaults to `"compressed-tensors"`): <fill_docstring>
1106
-
run_compressed (`bool`, *optional*, defaults to `True`): <fill_docstring>
1097
+
config_groups (`dict[str, typing.Union[ForwardRef('QuantizationScheme'), list[str]]] | None`, *optional*): dictionary mapping group name to a quantization scheme definition
1098
+
format (`str`, *optional*, defaults to `"dense"`): format the model is represented as. Set `run_compressed` True to execute model as the
1099
+
quantization_status (`QuantizationStatus`, *optional*, defaults to `"initialized"`): status of model in the quantization lifecycle, ie 'initialized', 'calibration', 'frozen'
1100
+
kv_cache_scheme (`Optional`, *optional*): specifies quantization of the kv cache. If None, kv cache is not quantized.
1101
+
global_compression_ratio (`float | None`, *optional*): 0-1 float percentage of model compression
1102
+
ignore (`list[str] | None`, *optional*): layer names or types to not quantize, supports regex prefixed by 're:'
1103
+
sparsity_config (`dict[str, typing.Any] | None`, *optional*): configuration for sparsity compression
1104
+
transform_config (`Optional`, *optional*): configuration for (hadamard) transforms
1105
+
quant_method (`str`, *optional*, defaults to `"compressed-tensors"`): do not override, should be compressed-tensors
1106
+
run_compressed (`bool`, *optional*, defaults to `True`): alter submodules (usually linear) in order to emulate compressed model execution if True, otherwise use default submodule
0 commit comments