Description
System Info
peft: 0.7.1; torch: 2.3.0.dev20240128+cu121; accelerate: 0.26.1; transformers: 4.37.2; Python: 3.10.12
Using the Pytorch container 23.12 provided by Nvidia.
The hardware environment contains four A100-40G graphics cards.
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
Hi, I want to use both FSDP and peft in my project, and I insert Lora to the pretrained LLM by peft.get_peft_model
and then wrap the whole model using torch.distributed.fsdp.FullyShardedDataParallel
. The only trainable part of the model is the Lora adapter. Additionally, I need to call the original model by with my_model.disable_adapter():
. When running the whole code, I encounter following error(intercepted relevant parts):
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 853, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in call_impl
return forward_call(*args, **kwargs)
File "/data4/Projects/CoCap/mm_video/experiments/context_compression/modeling/in_context_autoencoder.py", line 373, in forward
with self.base_llm.disable_adapter():
File "/usr/lib/python3.10/contextlib.py", line 135, in enter
return next(self.gen)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 567, in disable_adapter
self.base_model.disable_adapter_layers()
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 403, in disable_adapter_layers
self.set_adapter_layers(enabled=False)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 381, in set_adapter_layers
module.enable_adapters(enabled)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 403, in enable_adapters
layer.requires_grad(False)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2435, in requires_grad
p.requires_grad(requires_grad)
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
Expected behavior
Using with my_model.disable_adapter():
to call the original model, even though it is wrapped by FSDP.