[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881

kitaekatt · 2025-12-15T16:57:53Z

Summary

Add attn_logit_softcapping extraction to GGUF config mapping for Gemma2 and Gemma3 architectures.

Problem

When loading Gemma2/Gemma3 GGUF models, the attn_logit_softcapping parameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.

This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the attention.logit_softcapping field, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.

Changes

Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma2 mapping in GGUF_CONFIG_MAPPING
Add "attention.logit_softcapping": "attn_logit_softcapping" to gemma3 mapping in GGUF_CONFIG_MAPPING
Add test_gemma_softcap_config_mapping test (follows test_deci_config_mapping pattern)

Testing

Unit Test Added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual Verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'

Testing Summary

Unit test added: test_gemma_softcap_config_mapping in tests/quantization/ggml/test_ggml.py

Manual verification (before/after comparison):

# Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False  # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False  # ❌ Missing

# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True   # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True   # ✅ Present

Test follows the existing test_deci_config_mapping pattern.

ydshieh · 2025-12-16T09:30:09Z

maybe @hmellor could review this one?

hmellor

This seems reasonable to me.

Could you provide an example reproducer that I can run before/after the fix?

kitaekatt · 2025-12-16T14:54:28Z

@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys.

The Problem:

The original PR mapped attention.logit_softcapping, but that key doesn't exist in GGUF metadata. The actual keys are:

gemma2.attn_logit_softcapping = 50.0
gemma2.final_logit_softcapping = 30.0

After stripping the gemma2. prefix, the mapping keys should be attn_logit_softcapping and final_logit_softcapping.

Reproducer:

from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING

gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)

# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
    if 'softcap' in key.lower():
        suffix = key.split('.', 1)[1]  # Strip 'gemma2.'
        print(f"GGUF: {key} -> mapping key: '{suffix}'")

# Output:
#   GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
#   GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'

# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2'])  # Should be True after fix

Fix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added final_logit_softcapping.

Add attn_logit_softcapping and final_logit_softcapping mappings to both gemma2 and gemma3 GGUF config mappings. Without these mappings, softcapping values are not extracted from GGUF metadata, causing the model to use hardcoded defaults instead of the actual values stored in the GGUF file. Also adds test_gemma_softcap_config_mapping to verify the mappings.

github-actions · 2025-12-16T18:00:15Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ggml

hmellor · 2025-12-18T14:59:46Z

Thanks @kitaekatt, did you mean to mark the PR as draft?

kitaekatt · 2025-12-18T19:51:46Z

Thanks @kitaekatt, did you mean to mark the PR as draft?

I have been doing additional testing and validation, let me wrap that up!

But if you want the fix now feel free to change the status to open or if you can't do that I can do so.

hmellor · 2025-12-18T20:00:44Z

I'm happy to wait for your testing to be complete

kitaekatt mentioned this pull request Dec 15, 2025

fix(gguf): Extract attn_logit_softcapping from GGUF metadata vllm-project/vllm#30427

Closed

kitaekatt marked this pull request as ready for review December 15, 2025 17:29

github-actions bot requested review from SunMarc and ydshieh December 15, 2025 17:29

hmellor reviewed Dec 16, 2025

View reviewed changes

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from d86c30c to 8e69ed1 Compare December 16, 2025 17:45

kitaekatt marked this pull request as draft December 16, 2025 17:45

kitaekatt force-pushed the fix/gemma-gguf-attn-logit-softcapping branch from a1127d1 to 0ecda9c Compare December 16, 2025 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881

kitaekatt commented Dec 15, 2025 •

edited

Loading

Uh oh!

kitaekatt commented Dec 15, 2025

Uh oh!

ydshieh commented Dec 16, 2025

Uh oh!

hmellor left a comment

Uh oh!

kitaekatt commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

kitaekatt commented Dec 18, 2025 •

edited

Loading

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881

Are you sure you want to change the base?

[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881

Conversation

kitaekatt commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Testing

Related

Uh oh!

kitaekatt commented Dec 15, 2025

Testing Summary

Uh oh!

ydshieh commented Dec 16, 2025

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

kitaekatt commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

kitaekatt commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kitaekatt commented Dec 15, 2025 •

edited

Loading

kitaekatt commented Dec 18, 2025 •

edited

Loading