Skip to content

Conversation

@kitaekatt
Copy link

@kitaekatt kitaekatt commented Dec 15, 2025

Summary

Add attn_logit_softcapping extraction to GGUF config mapping for Gemma2 and Gemma3 architectures.

Problem

When loading Gemma2/Gemma3 GGUF models, the attn_logit_softcapping parameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.

This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the attention.logit_softcapping field, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.

Changes

  • Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma2 mapping in GGUF_CONFIG_MAPPING
  • Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma3 mapping in GGUF_CONFIG_MAPPING
  • Add test_gemma_softcap_config_mapping test (follows test_deci_config_mapping pattern)

Testing

Unit Test Added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual Verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'

Related

This fix enables proper GGUF model loading in downstream projects like vLLM that rely on Transformers' GGUF config extraction.

@kitaekatt
Copy link
Author

Testing Summary

Unit test added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False  # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True   # ✅ Present

Test follows the existing test_deci_config_mapping pattern.

@ydshieh
Copy link
Collaborator

ydshieh commented Dec 16, 2025

maybe @hmellor could review this one?

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

Could you provide an example reproducer that I can run before/after the fix?

@kitaekatt
Copy link
Author

@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys.

The Problem:

The original PR mapped attention.logit_softcapping, but that key doesn't exist in GGUF metadata. The actual keys are:

gemma2.attn_logit_softcapping = 50.0
gemma2.final_logit_softcapping = 30.0

After stripping the gemma2. prefix, the mapping keys should be attn_logit_softcapping and final_logit_softcapping.

Reproducer:

from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING

gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)

# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
    if 'softcap' in key.lower():
        suffix = key.split('.', 1)[1]  # Strip 'gemma2.'
        print(f"GGUF: {key} -> mapping key: '{suffix}'")

# Output:
#   GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
#   GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'

# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix

Fix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added final_logit_softcapping.

@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from d86c30c to 8e69ed1 Compare December 16, 2025 17:45
@kitaekatt kitaekatt marked this pull request as draft December 16, 2025 17:45
Add attn_logit_softcapping and final_logit_softcapping mappings
to both gemma2 and gemma3 GGUF config mappings.

Without these mappings, softcapping values are not extracted from
GGUF metadata, causing the model to use hardcoded defaults instead
of the actual values stored in the GGUF file.

Also adds test_gemma_softcap_config_mapping to verify the mappings.
@kitaekatt kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from a1127d1 to 0ecda9c Compare December 16, 2025 17:59
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

@hmellor
Copy link
Member

hmellor commented Dec 18, 2025

Thanks @kitaekatt, did you mean to mark the PR as draft?

@kitaekatt
Copy link
Author

kitaekatt commented Dec 18, 2025

Thanks @kitaekatt, did you mean to mark the PR as draft?

I have been doing additional testing and validation, let me wrap that up!

But if you want the fix now feel free to change the status to open or if you can't do that I can do so.

@hmellor
Copy link
Member

hmellor commented Dec 18, 2025

I'm happy to wait for your testing to be complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants