Release v0.8.0 · intel/auto-round

Highlights

merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
MXFP4 and MXFP8 loading support by @yiliu30 in #832
Support Flux quantization by @mengniwang95 in #850

What's Changed

fix cuda ut bug of use_deterministic_algorithms by @n1ck-guo in #805
remove torch compile in nv quant by @wenhuach21 in #807
Support loading for static quant weight fp8 act fp8 by @yiliu30 in #730
fix bug of q_layer_inputs by @n1ck-guo in #811
fix gptqmodel inference issue by @wenhuach21 in #813
Bump version to v0.7.0 by @XuehaoSun in #814
fix nsamples in get_dataloader by @wenhuach21 in #804
Refine logger and add envs by @yiliu30 in #817
Fix llm-compressor export by @Kaihui-intel in #820
enhance auto-round eval with vllm backend by @xin3he in #815
rm triton from requirements and correct the supported python version to 3.10(+) by @wenhuach21 in #824
move environment variable setting into eval function by @xin3he in #829
bump version to 0.8.0.dev by @XuehaoSun in #830
[STEP 1] merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
add support for scheme FP8_STATIC to export llm_compressor format by @n1ck-guo in #816
fix format checking bug by @WeiweiZhang1 in #836
MXFP4 and MXFP8 loading support by @yiliu30 in #832
hpu build with auto_round package name by @chensuyue in #838
fix hpu detect issue by @xin3he in #823
fix severe vram leak regression in auto-round format packing by @wenhuach21 in #842
fix tp device issue caused by device_map by @xin3he in #833
fix log error by @n1ck-guo in #843
[High Risk]Refine inference code by @wenhuach21 in #840
fix gguf fp8 input model and vram issue by @wenhuach21 in #844
NVFP4 Loading support by @yiliu30 in #839
fix extra config by @n1ck-guo in #847
change the method of detecting linear by @n1ck-guo in #849
fix device_map setting by @Kaihui-intel in #854
Add typo checker by @XuehaoSun in #846
fix parse layer config bug by @wenhuach21 in #856
Refine BackendInfo to include act fields by @yiliu30 in #848
fix bug of data_type fp8_sym by @n1ck-guo in #855
fix save_quantied format cheaker by @WeiweiZhang1 in #857
fix bug of get_layer_names_in_block by @wenhuach21 in #861
raise vlm loading error by @wenhuach21 in #863
fix FP8 model as input and backend issue by @wenhuach21 in #864
fix seqlen bug and calib slow of mllm tuning by @n1ck-guo in #871
fix device bug by @xin3he in #873
fix vllm backend evaluation by @xin3he in #872
Optimize CPU unit test workflow by @XuehaoSun in #881
Fix Cuda CI failures due to Transformers and AWQ incompatibility by @WeiweiZhang1 in #882
Support Flux quantization by @mengniwang95 in #850
fp8 exporting bugfix by @WeiweiZhang1 in #874
lm_eval stop try except and add back missing arguments by @xin3he in #884
Fix act calibration bug by @mengniwang95 in #880
restrict accelerate version by @wenhuach21 in #885
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #868
update require accelerate version by @n1ck-guo in #888

Full Changelog: v0.7.1...v0.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.8.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Contributors

Uh oh!