Support NF4/FP4 data type in weight-only #1185

xin3he · 2023-08-24T10:01:28Z

Type of Change

feature

Description

support NF4/FP4 data type in weight-only, allow tuning dtype and compressing nf4/fp4 model

Expected Behavior & Potential Risk

UT pass

How has this PR been tested?

local test

Dependency Change?

N/A

Signed-off-by: Xin He <xin3.he@intel.com>

hshen14 · 2023-08-25T01:26:47Z

Which FP4 format? Should specify FP4_E3M0, FP4_E2M1. You can align with Penghui.

docs/source/quantization_weight_only.md

neural_compressor/strategy/utils/tuning_space.py

xin3he · 2023-08-25T02:44:21Z

Which FP4 format? Should specify FP4_E3M0, FP4_E2M1. You can align with Penghui.

FLOAT_MAPPING = {'nf4': NF4, 'fp4': FP4_BNB, 'fp4_bnb': FP4_BNB, 'fp4_e2m1': FP4_E2M1, 'e2m1': FP4_E2M1}
By default, fp4 is the one bitsandbytes used. Developers can select from ['fp4_e2m1', 'fp4_bnb']

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2023-08-25T09:36:25Z

We still miss GTPQ and TEQ to support NF4/FP4, it will happen in late PRs.

* support NF4/FP4 data type in weight-only RTN & AWQ algo, allow tuning dtype and compressing nf4/fp4 mode Signed-off-by: Xin He <xin3.he@intel.com> --------- Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

xin3he requested review from PenghuiCheng, hshen14, wenhuach21 and yiliu30 August 24, 2023 10:01

xin3he added enhancement New feature or request new feature labels Aug 24, 2023

xin3he changed the title ~~Nf4~~ Support NF4/FP4 data type in weight-only Aug 24, 2023

xin3he added 5 commits August 25, 2023 09:11

add nf4

bffeede

Signed-off-by: Xin He <xin3.he@intel.com>

add __repr__

d19463f

Signed-off-by: Xin He <xin3.he@intel.com>

support dtype tuning and compress nf4 model

444700e

Signed-off-by: Xin He <xin3.he@intel.com>

fix bug

2c3b58f

Signed-off-by: Xin He <xin3.he@intel.com>

fix bug

4f373f6

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he force-pushed the nf4 branch from db0ae55 to 4f373f6 Compare August 25, 2023 01:12

yiliu30 reviewed Aug 25, 2023

View reviewed changes

docs/source/quantization_weight_only.md Show resolved Hide resolved

neural_compressor/strategy/utils/tuning_space.py Outdated Show resolved Hide resolved

yiliu30 self-requested a review August 25, 2023 07:44

yiliu30 approved these changes Aug 25, 2023

View reviewed changes

add notes

6108523

Signed-off-by: Xin He <xin3.he@intel.com>

PenghuiCheng approved these changes Aug 25, 2023

View reviewed changes

xin3he added 3 commits August 25, 2023 17:23

support AWQ

dda1d78

Signed-off-by: Xin He <xin3.he@intel.com>

rename fp4_bnb to fp4_e2m1_bnb

a70b643

Signed-off-by: Xin He <xin3.he@intel.com>

add inc_dict

97bfba9

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he merged commit 3d11b5e into master Aug 26, 2023

xin3he deleted the nf4 branch August 26, 2023 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support NF4/FP4 data type in weight-only #1185

Support NF4/FP4 data type in weight-only #1185

Uh oh!

xin3he commented Aug 24, 2023

Uh oh!

hshen14 commented Aug 25, 2023

Uh oh!

Uh oh!

Uh oh!

xin3he commented Aug 25, 2023

Uh oh!

xin3he commented Aug 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support NF4/FP4 data type in weight-only #1185

Support NF4/FP4 data type in weight-only #1185

Uh oh!

Conversation

xin3he commented Aug 24, 2023

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

Uh oh!

hshen14 commented Aug 25, 2023

Uh oh!

Uh oh!

Uh oh!

xin3he commented Aug 25, 2023

Uh oh!

xin3he commented Aug 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants