Skip to content

Conversation

@KaiyangLi1992
Copy link

Motivation

This PR adds UniLoRA, a LoRA-style parameter-efficient fine-tuning method
that introduces a unified parameterization for low-rank adaptations, enabling
further reductions in the number of trainable parameters while preserving
the standard PEFT workflow.

What's included

  • UniLoRA tuner implementation
  • Configuration class and registry integration
  • Save/load support
  • Unit tests

Checklist

  • Compatible with existing PEFT APIs
  • Backward-compatible
  • Tests added

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @KaiyangLi1992, thanks for the PR :)

  • Most general: let's rename UniLoRA* to UniLora which makes it easier to remember how the method is typed in code (to be consistent with LoraModel and friends).
  • I noticed that the copyright notice says 2024, let's use the correct starting year in all newly introduced files: 2025.
  • Before pushing changes it is always good to run make style to correct any style issues automatically, otherwise the CI will not be happy

It is good to see that you've already added some tests. Let's extend those by adding UniLoRA to the TEST_CASES list in tests/test_custom_models.py - this will already give quite a bit of coverage. You can check the results by running:

pytest tests/test_custom_models.py

If those tests run we can extend the tests to test_decoder_models.py and test_encoder_decoder_models.py in a similar fashion.

For this to be mergable we also need documentation in docs/source/package_reference/unilora.md (added to the _toctree.md).

@@ -0,0 +1,23 @@
# Copyright 2024-present the HuggingFace Inc. team.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2024-present the HuggingFace Inc. team.
# Copyright 2025-present the HuggingFace Inc. team.

Comment on lines +119 to +123
"help": (
"Names or patterns of modules to apply UniLoRA to. Accepts a list of "
"module name suffixes, a regex string, or the special value "
"'all-linear' to match all Linear/Conv1D layers except the output layer."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just copy the documentation from the docstring above for the help string of the config values.

Comment on lines +111 to +115
"""
Updates the scaling factors.
Note: Method name kept as update_norm for compatibility if called externally,
but arguments updated to 'scales'.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this comment. update_norm isn't mentioned anywhere else in the code base?

Can we just rename this function to update_scaling?

# Using theta_d's dtype is safer for checking fp16/32 mismatch
dtype = self.unilora_theta_d[adapter].dtype

cast_to_fp32 = device.type == "cpu" and dtype == torch.float16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cast_to_fp32 = device.type == "cpu" and dtype == torch.float16
cast_to_fp32 = device.type == "cpu" and (dtype == torch.float16 or dtype == torch.bfloat16)

if cast_to_fp32:
unilora_theta_d = unilora_theta_d.float()

# Changed: Replaced 'logits' with 'indices' and 'norm' with 'scales'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a LLM comment? Please clean up after your tools :)

Comment on lines +171 to +196
def get_nb_savable_parameters(self, adapter="default") -> tuple[int, int]:
"""
Returns the number of savable Uni-LoRA parameters and other savable parameters.
"""
theta_d_params = 0
other_params = 0
for name, param in self.named_parameters():
if "unilora_theta_d" in name:
theta_d_params += param.numel()
elif "unilora_indices" in name:
other_params += param.numel()
elif "unilora_scales" in name:
other_params += param.numel()

unilora_params = theta_d_params
return unilora_params, other_params

def print_savable_parameters(self) -> None:
"""
Prints the number of savable Uni-LoRA parameters and total savable parameters.
"""
unilora_params, other_params = self.get_nb_savable_parameters()
print(
f"Uni-LoRA params to-be-saved (float32-equivalent): {unilora_params:,d} "
f"|| total params to-be-saved: {(unilora_params + other_params):,d}"
) No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two functions ( get_nb_savable_parameters, print_savable_parameters) have a particular use or are they for debugging? If it is the latter, let's remove them.


# --- UniLoRA-specific initialization logic (global hash index assignment) ---
# 1. Count the total number of required indices (using the new `indices` variable)
LoRA_para_cnt = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LoRA_para_cnt = 0
lora_param_count = 0

for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)):
module.update_norm(adapter_name, scale_a, scale_b)

def generate_index(self, LoRA_para_cnt, theta_d_length,proj_seed):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would greatly benefit from a docstring explaining what it does and why

Comment on lines +69 to +70
for module, (scale_a, scale_b) in zip(uni_modules, zip(*[iter(norm_factors)] * 2)):
module.update_norm(adapter_name, scale_a, scale_b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mean that scale_a == scale_b? Let's simplify the zip() statement then and only pass scale for both scale params. If the user wants to experiment with this setting they can set the scale attribute on the layers manually.

# 2. Generate globally uniform-distributed indices
theta_d_length = config[adapter_name].theta_d_length
proj_seed = config[adapter_name].proj_seed
all_elements = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all_elements = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed)
indices = self.generate_index(LoRA_para_cnt,theta_d_length,proj_seed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants