Release v0.9.0 · intel/auto-round

Highlights

support automatic mixed bits assignment by @wenhuach21 in #851
optimize rtn for int woq by @wenhuach21 in #924
support for model scope by @n1ck-guo in #957
enhance auto device map and support XPU by @xin3he in #961
support for immediate saving to reduce ram usage by @Kaihui-intel in #965
update gguf alg ext by @n1ck-guo in #1026

What's Changed

Fix rtn tuning_device issue by @Kaihui-intel in #893
fix vlm gguf ut by @n1ck-guo in #895
update alg_ext.abi3.so with python compatible version by @chensuyue in #894
move ste from quant to round for nvfp4 by @xin3he in #889
Add GPT-OSS quant support by @yiliu30 in #887
better help printing information by @n1ck-guo in #883
speedup quant and evaluation, fix recompile issue by @xin3he in #897
fix nvfp act quantization bug by @WeiweiZhang1 in #891
support automatic mixed bits assignment by @wenhuach21 in #851
try to fix gguf vram issue on windows by @wenhuach21 in #886
remove numba from requirments by @yiliu30 in #905
Extend mxfp loading dtypes by @yiliu30 in #907
block dataset logger info by @n1ck-guo in #908
fix torch compile issue in AutoScheme by @wenhuach21 in #909
Revert "Extend mxfp loading dtypes" by @wenhuach21 in #915
support disable_opt_rtn in auto-scheme by @wenhuach21 in #913
fix llama 4 ut by @n1ck-guo in #896
Add numba for cpu lib by @yiliu30 in #919
Loosen the packing restrictions for mxfp&nvfp by @WeiweiZhang1 in #911
Extend mxfp loading dtypes by @yiliu30 in #916
Fix act config exporting for mixed schemes by @WeiweiZhang1 in #903
optimize rtn for int woq by @wenhuach21 in #924
fix bug of gguf and support for LiquidAI/LFM2-1.2B by @n1ck-guo in #927
remove numpy<2.0 limitation by @xin3he in #921
enable regex quantization config saving for mixed bits by @WeiweiZhang1 in #825
Fix Flux tuning issue by @mengniwang95 in #936
gguf support for inclusionAI/Ling-flash-2.0 by @n1ck-guo in #940
remove low_cpu_mem by @n1ck-guo in #934
Add compatibility test by @XuehaoSun in #918
Add commit hash to version by @XuehaoSun in #941
gguf weight type align with original, output.weight, token_embed by @n1ck-guo in #900
support attention mask in user's dataset by @wenhuach21 in #930
Add diffusion README by @mengniwang95 in #923
update readme by @wenhuach21 in #949
refactor utils file by @n1ck-guo in #943
update readme for sglang support by @WeiweiZhang1 in #953
update gguf and support for CompressedLinear by @n1ck-guo in #950
Reduce AutoSchem VRAM usage by up to 10X by @wenhuach21 in #944
add self attribution and fix avg_bits error by @xin3he in #956
add logo by @wenhuach21 in #960
refine AutoScheme readme/code by @wenhuach21 in #958
update readme by @wenhuach21 in #962
fix critic disable_opt_rtn regression by @wenhuach21 in #963
[1/N] Initial vllm-ext evaluation support (MXFP4 MOE) by @yiliu30 in #935
fix bug of imatrix contains 0 by @n1ck-guo in #955
fix rtn bug by @mengniwang95 in #966
enhance flux doc by @mengniwang95 in #967
clean code by @wenhuach21 in #968
support for model scope by @n1ck-guo in #957
merge main branch to alg_ext by @wenhuach21 in #970
fix cuda CI backend issue, fixtypo by @WeiweiZhang1 in #974
disable compile packing by default by @yiliu30 in #975
enhance auto device map and support XPU by @xin3he in #961
refine readme by @wenhuach21 in #978
cli support for positional arguments model by @n1ck-guo in #979
update bits in UT by @xin3he in #986
fix guff scheme and device_map bug by @n1ck-guo in #969
add support for Magistral-Small by @n1ck-guo in #980
support model_dtype and fix bug of scheme contains quotes, mllm eval by @n1ck-guo in #985
fix bug of cannot create adam compressor by @n1ck-guo in #992
[CI] Update python to 3.12 and torch to 2.8.0 by @XuehaoSun in #741
fix lm head bug and rm clear_mem_reach_threhold by @wenhuach21 in #997
Reduce peak gpu memory usage and support moe estimation by @xin3he in #981
fix cuda ut bug by @n1ck-guo in #999
fix mllm device_map ut by @Kaihui-intel in #1000
refine md tables by @WeiweiZhang1 in #994
Refine exllamav2 ut by @WeiweiZhang1 in #1001
Support for immediate saving to reduce ram usage by @Kaihui-intel in #965
Fix diffusion multi-device ut issue by @mengniwang95 in #1002
fix multiple devices map issue in calibration by @wenhuach21 in #1003
Fix non auto device map by @WeiweiZhang1 in #1005
fix multiple devices issue in Compressor and AutoScheme by @wenhuach21 in #1007
fix cuda low_cpu_mem_usage ut by @Kaihui-intel in #1010
Fix param missing bug by @mengniwang95 in #1008
add device list to clear memory by @wenhuach21 in #1009
Minor refactor for LLMC by @yiliu30 in #993
fix one clear memory issue by @wenhuach21 in #1011
add ut for gguf alg_ext and update so file by @n1ck-guo in #1012
fix multi cuda ut bug by @n1ck-guo in #1014
Including auto_scheme.default_alg into pypi by @chensuyue in #1018
add num_device check for set_auto_device_map_for_block_with_tuning by @xin3he in #1021
dispatch model with real max memory by @xin3he in #1022
fix cuda ut by @n1ck-guo in #1020
disable itrex format first by @WeiweiZhang1 in #998
fix bug of lm_head and dispatch model,gguf eval by @n1ck-guo in #1025
Fix the missing temporary name by @yiliu30 in #1029
Reduce mem usage of GPT-OSS by @yiliu30 in #1013
update gguf alg ext by @n1ck-guo in #1026
optimize vram for gguf and add momentum by @wenhuach21 in #1031
fix incorrect model name in readme by @wenhuach21 in #1035
Bump into v0.9.0 by @XuehaoSun in #1024

Full Changelog: v0.8.0...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Contributors

Uh oh!