Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

Merged

Conversation

BenjaminBossan
Copy link
Member

Resolves #1860

As discussed in that issue, it's not user friendly to delete the active adapter of a PiSSA/OLoRA model after calling save_pretrained with weight conversion. Instead, it is much more intuitive to delete the initial adapter instead, since it is loaded inside the method and not by the user, so it's really an implementation detail.

Apart from this, I made the following related changes:

  • Put everything in a try ... finally to ensure that the initial adapter does not hang around if there is an error (thus not hogging memory).
  • Renamed initial_adapter to initial_adapter_name, to make it clear that this is the name and not the adapter itself.
  • Fix lines that are too long not fixed automatically by ruff (as they're strings).

For this reason, the diff looks much bigger than it actually is, most lines are actually identical.

Resolves huggingface#1860

As discussed in that issue, it's not user friendly to delete the default
adapter of a PiSSA/OLoRA model after calling save_pretrained with weight
conversion. Instead, it is much more intuitive to delete the initial
adapter instead, since it is loaded inside the method and not by the
user, so it's really an implementation detail.

Apart from this, I made the following related changes:

- Put everything in a try ... finally to ensure that the initial adapter
  does not hang around if there is an error (thus not hogging memory).
- Renamed initial_adapter to initial_adapter_name, to make it clear that
  this is the name and not the adapter itself.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member Author

Again tagging @fxmeng and @tokenizer-decode as this affects the code they contributed.

@tokenizer-decode
Copy link
Contributor

Makes so much more sense. Just one think that I did not realize before, this changes peak GPU mem usage right? Or does load_adapter loads the model to CPU RAM? If we are increasing peak GPU mem usage, user is probably not aware of that because this is an implementation detail. I think no one would think save_pretrained would increase their peak mem usage. One think I've got burned from is resume_from_checkpoint option in transformers library. For whatever reason it increases peak GPU mem usage. And I've really had problems with it and couldn't find any documentation about why. So I'd like to be super cautious here.

@BenjaminBossan
Copy link
Member Author

Yes, I had the same concerns with regard to memory. That's why I decided to delete the init adapter and not leave it hanging around.

Of course, this still doesn't change that we'll have a bit of a peak for the time when both adapters are loaded at the same time. However, I think in the grand scheme of things it shouldn't matter much. The peak memory should be affected much more during training, when it needs to account for hidden states etc. Unless I'm missing something, if there is enough memory to train the model, there should be enough to temporarily load a second LoRA adapter.

We can still think about ways of preventing this but it's not really related to this PR, so I'd do that in a separate one.

One think I've got burned from is resume_from_checkpoint option in transformers library. For whatever reason it increases peak GPU mem usage.

It would be great to have a reproducer for this.

@tokenizer-decode
Copy link
Contributor

tokenizer-decode commented Jul 18, 2024

I really like to fit as much data as I possible can to my RAM in a given time. Assume I'm training at 39.5 GB of my 40 GB HB RAM. If the trainer is freeing the memory as soon as training is done, that would not be an issue and we can load the new adapter. But if not that could potentially OOM. It depends on the trainer in this case.

It would be great to have a reproducer for this.

Sure. Would you like me to open an issue here or in transformers repo?

@BenjaminBossan
Copy link
Member Author

Sure. Would you like me to open an issue here or in transformers repo?

It could be a transformers issue or also an accelerate issue (or even PyTorch), depending on what exactly is happening, so I can't say for sure. But it sounds like something that needs addressing.

adapter_name=initial_adapter_name,
)
if any(
str(self.peft_config[initial_adapter_name].init_lora_weights).lower().startswith(prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assign str(self.peft_config[initial_adapter_name].init_lora_weights).lower().startswith(prefix) in a variable and use it. Easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

raise ValueError(
"The `init_lora_weights` parameter of the initial adapter should be set to `True`. "
"Otherwise, `self.load_adapter` will subtract the decomposed values again based on the residual model."
initial_adapter_name = os.path.basename(path_initial_model_for_weight_conversion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for sanity: it's 100 percent okay to assume that it's not against the user to always delete the initial adapter and not letting them choose it instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is pretty much no reason why users would want to load the initial model and delete the trained model. I pondered keeping both, but users could be surprised by the extra memory required, resulting for instance in evals that are run after training and saving the model to OOM.

The new behavior is documented in the release notes: https://github.com/huggingface/peft/releases/tag/untagged-f3a302cd5b4f9d08a46b

(Release planned later today)

Comment on lines +326 to +332
peft_config_keys_before = list(peft_model.peft_config.keys())
peft_model.save_pretrained(
tmp_path / "pissa-model-converted", path_initial_model_for_weight_conversion=tmp_path / "init-model"
)
peft_config_keys_after = list(peft_model.peft_config.keys())
assert peft_config_keys_before == peft_config_keys_after

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding. Why would this hold given we're deleting an adapter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do the weight conversion, we have to temporarily load the initial adapter (i.e. from before it was trained). Before this PR, we would keep that initial adapter around but delete the actually trained adapter. This is surprising for users, who would rather keep the trained adapter around, e.g. to perform evals (see the linked issue #1860). With this PR, instead we keep the trained adapter around and delete the initial adapter, which was only really needed for the conversion step.

As such, I added the assert that the loaded adapters are the same before and after saving, which is also true when saving any other kind of model.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a comment and some clarification questions.

@BenjaminBossan
Copy link
Member Author

Merging despite Windows CI failing. As shown in #1947, Windows CI is failing for some unrelated reason.

@BenjaminBossan BenjaminBossan merged commit 05f57e9 into huggingface:main Jul 24, 2024
10 of 14 checks passed
@BenjaminBossan BenjaminBossan deleted the pissa-olora-delete-init-adapter branch July 24, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do we have to delete the PiSSA adapter after save_pissa_as_lora
4 participants