Closed
Description
What happened?
The lm_head
layer for a Gemma2 LoRA adapter is not converted by convert_lora_to_gguf.py
, and therefore not applied at inference (ruining performance of the adapter).
How to reproduce:
Expand
- LoRA fine-tune Gemma2 with
pytorch
/peft
includinglm_head
in thetarget_modules
param:config = LoraConfig(target_modules=["lm_head"], ...)
- Save the adapter.
- Convert the adapter debugging
then the
python convert_lora_to_gguf.py <adapter folder> --base <base model folder> --outtype f32
lm_head
layer is skipped by this line inconvert_hf_to_gguf.py
(and no error is raised):if name == "lm_head.weight": logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.") return []
- Run
llama-cli
to check that indeed no lora layer is applied in the respective line in llama.cpp:./llama-cli -m base/model/path/Base-F32.gguf \ --lora lora/model/path/Lora-F32-LoRA.gguf \ -p "Hello Gemma2" -n 50
Expected behaviour
I think this is a bug because a user might have trained an adapter that is applied to the the lm_head
layer, so skipping it on conversion will destroy the adapter's performance. I think the code should either:
- raise an error saying
Cannot convert Gemma2 adapter with lm_head layer
or
- handle the lm_head layer (although it might be tricky for merging adapters as the
lm_head
layer shares the weights with theembed
layer in Gemma2, probably leading to having to create a new tensor for thelm_head
to merge the adapter to).
Comments
- I think the script
convert_lora_to_gguf.py
was introduced in PR Refactor lora adapter support #8332, so maybe the @ngxson knows if skipping thelm_head
is the desired outcome of if it is actually a bug. Otherwise I'm happy to try figure out why this happens. - This is not the case for, say, Phi3, which converts the
lm_head
lora layer correctly. - I can provide more code/models to reproduce the bug easily if that helps.
Name and Version
version: 3524 (bc0f887)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0
What operating system are you seeing the problem on?
MacOS, but it should be a platform-independent problem.
Relevant log output
No response
Activity