-
Notifications
You must be signed in to change notification settings - Fork 29.1k
Add optional RMSNorm support to BitNet quantization (config + layers) #38087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share a bit about motivation for such feature ?
@SunMarc |
Hi @Codys12, thanks for the pr 🤗 ! I'm not sure I understand the idea behind adding an extra
|
@MekkCyber @SunMarc Good point! In section 2/2.1 of the origininal BitNet paper (https://arxiv.org/pdf/2310.11453), the authors describe the reason for including the RMS: it improves performance of the models at negligable compute/parameter costs. They include it in their modeling file, but others (see here) have tested this with alternative architectures and observed an improvement (Llama, Mistral, DeepSeek V3, etc). This change in the Quantization Config is a model-agnostic approach for introducing this parameter so that a new modeling_*.py file is not required for every model you want to test this way. Additionally, the inclusion of this norm allows you to finetune existing models to this quantization (see here) as demonstrated by https://huggingface.co/codys12/bitnet-r1-32b and https://huggingface.co/codys12/bitnet-r1-8b. Let me know if there is anything that is still unclear! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @Codys12, I see the idea behind this !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Just a couple of nits
@SunMarc Just made these changes, let me know if there is anything else I can do before merge! |
Thanks @Codys12, can you please run |
@MekkCyber |
@MekkCyber @SunMarc Any ideas on CI here? Looking to help this move forward today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
try to run Traceback (most recent call last):
File "/root/transformers/utils/check_docstrings.py", line 1467, in <module>
check_docstrings(overwrite=args.fix_and_overwrite, check_all=args.check_all)
File "/root/transformers/utils/check_docstrings.py", line 1456, in check_docstrings
raise ValueError(error_message)
ValueError: There was at least one problem when checking docstrings of public objects.
The following objects docstrings do not match their signature. Run `make fix-copies` to fix this. In some cases, this error may be raised incorrectly by the docstring checker. If you think this is the case, you can manually check the docstrings and then add the object name to `OBJECTS_TO_IGNORE` in `utils/check_docstrings.py`.
- BitNetQuantConfig |
@SunMarc Hmm, I changed it to optional but running |
Wait, all tests are passing sick |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks for the pr 🤗 ! |
What does this PR do?
Adds optional RMSNorm support to BitNet-style quantisation.
use_rms_norm
(bool, default False) andrms_norm_eps
(float, default1e-6
) toBitNetQuantConfig
so the flag is serialisable viasave_pretrained / from_pretrained
.BitLinear
andAutoBitLinear
to acceptuse_rms_norm
and apply the referenceBitNetRMSNorm
to activations before quantisation.Before submitting
to_dict
, docstrings, and the model card.make style && make quality && make test
locally.make docs
) – pushed logs to CI.Motivation and context
RMSNorm stabilises the activations of low-bit networks; the BitNet paper shows a consistent perplexity drop when normalising pre-quant activations. This PR brings parity with the reference implementation while keeping the previous behaviour as default.
No new external dependencies.
Who can review?
Quantization / Accelerate folks for the code:
@SunMarc @MekkCyber
Docstrings & config: @stevhliu
Feel free to jump in with any feedback!