You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* support NF4/FP4 data type in weight-only RTN & AWQ algo, allow tuning dtype and compressing nf4/fp4 mode
Signed-off-by: Xin He <xin3.he@intel.com>
---------
Signed-off-by: Xin He <xin3.he@intel.com>
Copy file name to clipboardExpand all lines: docs/source/quantization_weight_only.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,11 +35,14 @@ There are many excellent works for weight only quantization to improve its accur
35
35
### **Quantization Capability**:
36
36
| Config | Capability |
37
37
| :---: | :---:|
38
+
| dtype |['int', 'nf4', 'fp4']|
38
39
| bits |[1-8]|
39
40
| group_size |[-1, 1-N]|
40
41
| scheme |['asym', 'sym']|
41
42
| algorithm |['RTN', 'AWQ', 'GPTQ']|
42
43
44
+
Notes: 4-bit NormalFloat(NF4) is proposed in QLoRA[5]. 'fp4' includes [fp4_e2m1](../../neural_compressor/adaptor/torch_utils/weight_only.py#L37) and [fp4_e2m1_bnb](https://github.com/TimDettmers/bitsandbytes/blob/18e827d666fa2b70a12d539ccedc17aa51b2c97c/bitsandbytes/functional.py#L735). By default, fp4 refers to fp4_e2m1_bnb.
0 commit comments