-
Notifications
You must be signed in to change notification settings - Fork 286
Support NF4/FP4 data type in weight-only #1185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
|
Which FP4 format? Should specify FP4_E3M0, FP4_E2M1. You can align with Penghui. |
FLOAT_MAPPING = {'nf4': NF4, 'fp4': FP4_BNB, 'fp4_bnb': FP4_BNB, 'fp4_e2m1': FP4_E2M1, 'e2m1': FP4_E2M1} |
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
|
We still miss GTPQ and TEQ to support NF4/FP4, it will happen in late PRs. |
* support NF4/FP4 data type in weight-only RTN & AWQ algo, allow tuning dtype and compressing nf4/fp4 mode Signed-off-by: Xin He <xin3.he@intel.com> --------- Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Type of Change
feature
Description
support NF4/FP4 data type in weight-only, allow tuning dtype and compressing nf4/fp4 model
Expected Behavior & Potential Risk
UT pass
How has this PR been tested?
local test
Dependency Change?
N/A