Skip to content

Bug: Gemma2 adapter weights lm_head skipped on gguf conversion #9065

Closed
@ltoniazzi

Description

@ltoniazzi

What happened?

The lm_head layer for a Gemma2 LoRA adapter is not converted by convert_lora_to_gguf.py, and therefore not applied at inference (ruining performance of the adapter).


How to reproduce:

Expand
  1. LoRA fine-tune Gemma2 with pytorch/peft including lm_head in the target_modules param:
    config = LoraConfig(target_modules=["lm_head"], ...)
  2. Save the adapter.
  3. Convert the adapter debugging
    python convert_lora_to_gguf.py <adapter folder> --base <base model folder> --outtype f32
    then the lm_head layer is skipped by this line in convert_hf_to_gguf.py (and no error is raised):
    if name == "lm_head.weight":
       logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.")
       return []
  4. Run llama-cli to check that indeed no lora layer is applied in the respective line in llama.cpp:
    ./llama-cli -m base/model/path/Base-F32.gguf \
    --lora lora/model/path/Lora-F32-LoRA.gguf \
    -p "Hello Gemma2" -n 50

Expected behaviour

I think this is a bug because a user might have trained an adapter that is applied to the the lm_head layer, so skipping it on conversion will destroy the adapter's performance. I think the code should either:

  • raise an error saying Cannot convert Gemma2 adapter with lm_head layer

or

  • handle the lm_head layer (although it might be tricky for merging adapters as the lm_head layer shares the weights with the embed layer in Gemma2, probably leading to having to create a new tensor for the lm_head to merge the adapter to).

Comments

  • I think the script convert_lora_to_gguf.py was introduced in PR Refactor lora adapter support #8332, so maybe the @ngxson knows if skipping the lm_head is the desired outcome of if it is actually a bug. Otherwise I'm happy to try figure out why this happens.
  • This is not the case for, say, Phi3, which converts the lm_head lora layer correctly.
  • I can provide more code/models to reproduce the bug easily if that helps.

Name and Version

version: 3524 (bc0f887)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0

What operating system are you seeing the problem on?

MacOS, but it should be a platform-independent problem.

Relevant log output

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions