-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add UniLoRA tuner to PEFT #2968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @KaiyangLi1992, thanks for the PR :)
- Most general: let's rename
UniLoRA*toUniLorawhich makes it easier to remember how the method is typed in code (to be consistent withLoraModeland friends). - I noticed that the copyright notice says 2024, let's use the correct starting year in all newly introduced files: 2025.
- Before pushing changes it is always good to run
make styleto correct any style issues automatically, otherwise the CI will not be happy
It is good to see that you've already added some tests. Let's extend those by adding UniLoRA to the TEST_CASES list in tests/test_custom_models.py - this will already give quite a bit of coverage. You can check the results by running:
pytest tests/test_custom_models.py
If those tests run we can extend the tests to test_decoder_models.py and test_encoder_decoder_models.py in a similar fashion.
For this to be mergable we also need documentation in docs/source/package_reference/unilora.md (added to the _toctree.md).
| @@ -0,0 +1,23 @@ | |||
| # Copyright 2024-present the HuggingFace Inc. team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Copyright 2024-present the HuggingFace Inc. team. | |
| # Copyright 2025-present the HuggingFace Inc. team. |
| "help": ( | ||
| "Names or patterns of modules to apply UniLoRA to. Accepts a list of " | ||
| "module name suffixes, a regex string, or the special value " | ||
| "'all-linear' to match all Linear/Conv1D layers except the output layer." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just copy the documentation from the docstring above for the help string of the config values.
| """ | ||
| Updates the scaling factors. | ||
| Note: Method name kept as update_norm for compatibility if called externally, | ||
| but arguments updated to 'scales'. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this comment. update_norm isn't mentioned anywhere else in the code base?
Can we just rename this function to update_scaling?
| # Using theta_d's dtype is safer for checking fp16/32 mismatch | ||
| dtype = self.unilora_theta_d[adapter].dtype | ||
|
|
||
| cast_to_fp32 = device.type == "cpu" and dtype == torch.float16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cast_to_fp32 = device.type == "cpu" and dtype == torch.float16 | |
| cast_to_fp32 = device.type == "cpu" and (dtype == torch.float16 or dtype == torch.bfloat16) |
| if cast_to_fp32: | ||
| unilora_theta_d = unilora_theta_d.float() | ||
|
|
||
| # Changed: Replaced 'logits' with 'indices' and 'norm' with 'scales' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a LLM comment? Please clean up after your tools :)
| def get_nb_savable_parameters(self, adapter="default") -> tuple[int, int]: | ||
| """ | ||
| Returns the number of savable Uni-LoRA parameters and other savable parameters. | ||
| """ | ||
| theta_d_params = 0 | ||
| other_params = 0 | ||
| for name, param in self.named_parameters(): | ||
| if "unilora_theta_d" in name: | ||
| theta_d_params += param.numel() | ||
| elif "unilora_indices" in name: | ||
| other_params += param.numel() | ||
| elif "unilora_scales" in name: | ||
| other_params += param.numel() | ||
|
|
||
| unilora_params = theta_d_params | ||
| return unilora_params, other_params | ||
|
|
||
| def print_savable_parameters(self) -> None: | ||
| """ | ||
| Prints the number of savable Uni-LoRA parameters and total savable parameters. | ||
| """ | ||
| unilora_params, other_params = self.get_nb_savable_parameters() | ||
| print( | ||
| f"Uni-LoRA params to-be-saved (float32-equivalent): {unilora_params:,d} " | ||
| f"|| total params to-be-saved: {(unilora_params + other_params):,d}" | ||
| ) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these two functions ( get_nb_savable_parameters, print_savable_parameters) have a particular use or are they for debugging? If it is the latter, let's remove them.
|
|
||
| # --- UniLoRA-specific initialization logic (global hash index assignment) --- | ||
| # 1. Count the total number of required indices (using the new `indices` variable) | ||
| LoRA_para_cnt = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LoRA_para_cnt = 0 | |
| lora_param_count = 0 |
| for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)): | ||
| module.update_norm(adapter_name, scale_a, scale_b) | ||
|
|
||
| def generate_index(self, LoRA_para_cnt, theta_d_length,proj_seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function would greatly benefit from a docstring explaining what it does and why
| for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)): | ||
| module.update_norm(adapter_name, scale_a, scale_b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mean that scale_a == scale_b? Let's simplify the zip() statement then and only pass scale for both scale params. If the user wants to experiment with this setting they can set the scale attribute on the layers manually.
| # 2. Generate globally uniform-distributed indices | ||
| theta_d_length = config[adapter_name].theta_d_length | ||
| proj_seed = config[adapter_name].proj_seed | ||
| all_elements = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| all_elements = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed) | |
| indices = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed) |
Motivation
This PR adds UniLoRA, a LoRA-style parameter-efficient fine-tuning method
that introduces a unified parameterization for low-rank adaptations, enabling
further reductions in the number of trainable parameters while preserving
the standard PEFT workflow.
What's included
Checklist