PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

BenjaminBossan · 2024-07-17T13:36:28Z

Resolves #1860

As discussed in that issue, it's not user friendly to delete the active adapter of a PiSSA/OLoRA model after calling save_pretrained with weight conversion. Instead, it is much more intuitive to delete the initial adapter instead, since it is loaded inside the method and not by the user, so it's really an implementation detail.

Apart from this, I made the following related changes:

Put everything in a try ... finally to ensure that the initial adapter does not hang around if there is an error (thus not hogging memory).
Renamed initial_adapter to initial_adapter_name, to make it clear that this is the name and not the adapter itself.
Fix lines that are too long not fixed automatically by ruff (as they're strings).

For this reason, the diff looks much bigger than it actually is, most lines are actually identical.

Resolves huggingface#1860 As discussed in that issue, it's not user friendly to delete the default adapter of a PiSSA/OLoRA model after calling save_pretrained with weight conversion. Instead, it is much more intuitive to delete the initial adapter instead, since it is loaded inside the method and not by the user, so it's really an implementation detail. Apart from this, I made the following related changes: - Put everything in a try ... finally to ensure that the initial adapter does not hang around if there is an error (thus not hogging memory). - Renamed initial_adapter to initial_adapter_name, to make it clear that this is the name and not the adapter itself.

HuggingFaceDocBuilderDev · 2024-07-17T13:40:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-07-18T08:59:48Z

Again tagging @fxmeng and @tokenizer-decode as this affects the code they contributed.

tokenizer-decode · 2024-07-18T10:00:48Z

Makes so much more sense. Just one think that I did not realize before, this changes peak GPU mem usage right? Or does load_adapter loads the model to CPU RAM? If we are increasing peak GPU mem usage, user is probably not aware of that because this is an implementation detail. I think no one would think save_pretrained would increase their peak mem usage. One think I've got burned from is resume_from_checkpoint option in transformers library. For whatever reason it increases peak GPU mem usage. And I've really had problems with it and couldn't find any documentation about why. So I'd like to be super cautious here.

BenjaminBossan · 2024-07-18T10:25:06Z

Yes, I had the same concerns with regard to memory. That's why I decided to delete the init adapter and not leave it hanging around.

Of course, this still doesn't change that we'll have a bit of a peak for the time when both adapters are loaded at the same time. However, I think in the grand scheme of things it shouldn't matter much. The peak memory should be affected much more during training, when it needs to account for hidden states etc. Unless I'm missing something, if there is enough memory to train the model, there should be enough to temporarily load a second LoRA adapter.

We can still think about ways of preventing this but it's not really related to this PR, so I'd do that in a separate one.

One think I've got burned from is resume_from_checkpoint option in transformers library. For whatever reason it increases peak GPU mem usage.

It would be great to have a reproducer for this.

tokenizer-decode · 2024-07-18T10:51:20Z

I really like to fit as much data as I possible can to my RAM in a given time. Assume I'm training at 39.5 GB of my 40 GB HB RAM. If the trainer is freeing the memory as soon as training is done, that would not be an issue and we can load the new adapter. But if not that could potentially OOM. It depends on the trainer in this case.

It would be great to have a reproducer for this.

Sure. Would you like me to open an issue here or in transformers repo?

BenjaminBossan · 2024-07-18T11:58:46Z

Sure. Would you like me to open an issue here or in transformers repo?

It could be a transformers issue or also an accelerate issue (or even PyTorch), depending on what exactly is happening, so I can't say for sure. But it sounds like something that needs addressing.

sayakpaul · 2024-07-24T08:45:58Z

src/peft/peft_model.py

+                    adapter_name=initial_adapter_name,
+                )
+                if any(
+                    str(self.peft_config[initial_adapter_name].init_lora_weights).lower().startswith(prefix)


Let's assign str(self.peft_config[initial_adapter_name].init_lora_weights).lower().startswith(prefix) in a variable and use it. Easier to read.

sayakpaul · 2024-07-24T08:47:11Z

src/peft/peft_model.py

-                raise ValueError(
-                    "The `init_lora_weights` parameter of the initial adapter should be set to `True`. "
-                    "Otherwise, `self.load_adapter` will subtract the decomposed values again based on the residual model."
+            initial_adapter_name = os.path.basename(path_initial_model_for_weight_conversion)


Just for sanity: it's 100 percent okay to assume that it's not against the user to always delete the initial adapter and not letting them choose it instead?

Yes, there is pretty much no reason why users would want to load the initial model and delete the trained model. I pondered keeping both, but users could be surprised by the extra memory required, resulting for instance in evals that are run after training and saving the model to OOM.

The new behavior is documented in the release notes: https://github.com/huggingface/peft/releases/tag/untagged-f3a302cd5b4f9d08a46b

(Release planned later today)

sayakpaul · 2024-07-24T08:48:43Z

tests/test_initialization.py

+        peft_config_keys_before = list(peft_model.peft_config.keys())
        peft_model.save_pretrained(
            tmp_path / "pissa-model-converted", path_initial_model_for_weight_conversion=tmp_path / "init-model"
        )
+        peft_config_keys_after = list(peft_model.peft_config.keys())
+        assert peft_config_keys_before == peft_config_keys_after
+


For my understanding. Why would this hold given we're deleting an adapter?

To do the weight conversion, we have to temporarily load the initial adapter (i.e. from before it was trained). Before this PR, we would keep that initial adapter around but delete the actually trained adapter. This is surprising for users, who would rather keep the trained adapter around, e.g. to perform evals (see the linked issue #1860). With this PR, instead we keep the trained adapter around and delete the initial adapter, which was only really needed for the conversion step.

As such, I added the assert that the loaded adapters are the same before and after saving, which is also true when saving any other kind of model.

sayakpaul

Thanks! Left a comment and some clarification questions.

BenjaminBossan · 2024-07-24T10:55:29Z

Merging despite Windows CI failing. As shown in #1947, Windows CI is failing for some unrelated reason.

BenjaminBossan added 3 commits July 17, 2024 15:29

Make style

7f50fd8

Fix long strings

4ddbb79

BenjaminBossan mentioned this pull request Jul 17, 2024

Do we have to delete the PiSSA adapter after save_pissa_as_lora #1860

Closed

4 tasks

BenjaminBossan requested review from SunMarc and sayakpaul July 22, 2024 09:47

sayakpaul reviewed Jul 24, 2024

View reviewed changes

Reviewer feedback: Make conditional more readable

d41448e

sayakpaul approved these changes Jul 24, 2024

View reviewed changes

BenjaminBossan merged commit 05f57e9 into huggingface:main Jul 24, 2024
10 of 14 checks passed

BenjaminBossan deleted the pissa-olora-delete-init-adapter branch July 24, 2024 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

BenjaminBossan commented Jul 17, 2024

HuggingFaceDocBuilderDev commented Jul 17, 2024

BenjaminBossan commented Jul 18, 2024

tokenizer-decode commented Jul 18, 2024

BenjaminBossan commented Jul 18, 2024

tokenizer-decode commented Jul 18, 2024 •

edited

Loading

BenjaminBossan commented Jul 18, 2024

sayakpaul Jul 24, 2024

BenjaminBossan Jul 24, 2024

sayakpaul Jul 24, 2024

BenjaminBossan Jul 24, 2024

sayakpaul Jul 24, 2024

BenjaminBossan Jul 24, 2024

sayakpaul left a comment

BenjaminBossan commented Jul 24, 2024

PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter #1933

Conversation

BenjaminBossan commented Jul 17, 2024

HuggingFaceDocBuilderDev commented Jul 17, 2024

BenjaminBossan commented Jul 18, 2024

tokenizer-decode commented Jul 18, 2024

BenjaminBossan commented Jul 18, 2024

tokenizer-decode commented Jul 18, 2024 • edited Loading

BenjaminBossan commented Jul 18, 2024

sayakpaul Jul 24, 2024

Choose a reason for hiding this comment

BenjaminBossan Jul 24, 2024

Choose a reason for hiding this comment

sayakpaul Jul 24, 2024

Choose a reason for hiding this comment

BenjaminBossan Jul 24, 2024

Choose a reason for hiding this comment

sayakpaul Jul 24, 2024

Choose a reason for hiding this comment

BenjaminBossan Jul 24, 2024

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Jul 24, 2024

tokenizer-decode commented Jul 18, 2024 •

edited

Loading