Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,


## 🆕 What's New
[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.

[2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
refer to the documentation for accuracy [results](./docs/auto_scheme_acc.md) and [this guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for usage instructions.

Expand Down
3 changes: 1 addition & 2 deletions docs/auto_scheme_acc.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ to stabilize accuracy during evaluation. All other settings follow the default c
We ignore the scale and zp bits in the tables below. The accuracy may change a little as we modified a little of the
implementation. We will rerun all the experiments.

For mxfp experiment, we use fake model while for weight only model we use real model. **No tuning is applied unless explicit stated.
**
For mxfp experiment, we use fake model while for weight only model we use real model. **No tuning is applied unless explicit stated.**

*Average accuracy across `lambada_openai`, `hellaswag`, `piqa`, `winogrande`, and `mmlu`.*

Expand Down