Highlights
- merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
- MXFP4 and MXFP8 loading support by @yiliu30 in #832
- Support Flux quantization by @mengniwang95 in #850
What's Changed
- fix cuda ut bug of use_deterministic_algorithms by @n1ck-guo in #805
- remove torch compile in nv quant by @wenhuach21 in #807
- Support loading for static quant weight fp8 act fp8 by @yiliu30 in #730
- fix bug of q_layer_inputs by @n1ck-guo in #811
- fix gptqmodel inference issue by @wenhuach21 in #813
- Bump version to v0.7.0 by @XuehaoSun in #814
- fix nsamples in get_dataloader by @wenhuach21 in #804
- Refine logger and add envs by @yiliu30 in #817
- Fix llm-compressor export by @Kaihui-intel in #820
- enhance auto-round eval with vllm backend by @xin3he in #815
- rm triton from requirements and correct the supported python version to 3.10(+) by @wenhuach21 in #824
- move environment variable setting into eval function by @xin3he in #829
- bump version to 0.8.0.dev by @XuehaoSun in #830
- [STEP 1] merge all api(MLLM, Adam) into AutoRound by @n1ck-guo in #791
- add support for scheme FP8_STATIC to export llm_compressor format by @n1ck-guo in #816
- fix format checking bug by @WeiweiZhang1 in #836
- MXFP4 and MXFP8 loading support by @yiliu30 in #832
- hpu build with auto_round package name by @chensuyue in #838
- fix hpu detect issue by @xin3he in #823
- fix severe vram leak regression in auto-round format packing by @wenhuach21 in #842
- fix tp device issue caused by device_map by @xin3he in #833
- fix log error by @n1ck-guo in #843
- [High Risk]Refine inference code by @wenhuach21 in #840
- fix gguf fp8 input model and vram issue by @wenhuach21 in #844
- NVFP4 Loading support by @yiliu30 in #839
- fix extra config by @n1ck-guo in #847
- change the method of detecting linear by @n1ck-guo in #849
- fix device_map setting by @Kaihui-intel in #854
- Add typo checker by @XuehaoSun in #846
- fix parse layer config bug by @wenhuach21 in #856
- Refine
BackendInfoto include act fields by @yiliu30 in #848 - fix bug of data_type fp8_sym by @n1ck-guo in #855
- fix save_quantied format cheaker by @WeiweiZhang1 in #857
- fix bug of get_layer_names_in_block by @wenhuach21 in #861
- raise vlm loading error by @wenhuach21 in #863
- fix FP8 model as input and backend issue by @wenhuach21 in #864
- fix seqlen bug and calib slow of mllm tuning by @n1ck-guo in #871
- fix device bug by @xin3he in #873
- fix vllm backend evaluation by @xin3he in #872
- Optimize CPU unit test workflow by @XuehaoSun in #881
- Fix Cuda CI failures due to Transformers and AWQ incompatibility by @WeiweiZhang1 in #882
- Support Flux quantization by @mengniwang95 in #850
- fp8 exporting bugfix by @WeiweiZhang1 in #874
- lm_eval stop try except and add back missing arguments by @xin3he in #884
- Fix act calibration bug by @mengniwang95 in #880
- restrict accelerate version by @wenhuach21 in #885
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #868
- update require accelerate version by @n1ck-guo in #888
Full Changelog: v0.7.1...v0.8.0