-
Notifications
You must be signed in to change notification settings - Fork 31.5k
[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[GGUF] Add attn_logit_softcapping to Gemma2/Gemma3 config mapping #42881
Conversation
Testing SummaryUnit test added: Manual verification (before/after comparison): # Transformers 4.49.0 (PyPI - before fix)
>>> from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma2", {})
False # ❌ Missing
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING.get("gemma3", {})
False # ❌ Missing
# Transformers 5.0.0.dev0 (with this PR)
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma2"]
True # ✅ Present
>>> GGUF_CONFIG_MAPPING["gemma2"]["attention.logit_softcapping"]
'attn_logit_softcapping'
>>> "attention.logit_softcapping" in GGUF_CONFIG_MAPPING["gemma3"]
True # ✅ PresentTest follows the existing |
|
maybe @hmellor could review this one? |
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me.
Could you provide an example reproducer that I can run before/after the fix?
|
@hmellor Here's a reproducer. While testing, I found and fixed an issue with the mapping keys. The Problem: The original PR mapped After stripping the Reproducer: from gguf import GGUFReader
from huggingface_hub import hf_hub_download
from transformers.integrations.ggml import GGUF_CONFIG_MAPPING
gguf_path = hf_hub_download("bartowski/gemma-2-2b-it-GGUF", "gemma-2-2b-it-Q4_K_M.gguf")
reader = GGUFReader(gguf_path)
# Show actual GGUF keys (after stripping architecture prefix)
for key in reader.fields:
if 'softcap' in key.lower():
suffix = key.split('.', 1)[1] # Strip 'gemma2.'
print(f"GGUF: {key} -> mapping key: '{suffix}'")
# Output:
# GGUF: gemma2.attn_logit_softcapping -> mapping key: 'attn_logit_softcapping'
# GGUF: gemma2.final_logit_softcapping -> mapping key: 'final_logit_softcapping'
# Check mapping
print('attn_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2']) # Should be True after fix
print('final_logit_softcapping' in GGUF_CONFIG_MAPPING['gemma2']) # Should be True after fixFix pushed (d86c30c): Changed mapping keys to match actual GGUF metadata and added |
d86c30c to
8e69ed1
Compare
Add attn_logit_softcapping and final_logit_softcapping mappings to both gemma2 and gemma3 GGUF config mappings. Without these mappings, softcapping values are not extracted from GGUF metadata, causing the model to use hardcoded defaults instead of the actual values stored in the GGUF file. Also adds test_gemma_softcap_config_mapping to verify the mappings.
a1127d1 to
0ecda9c
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: ggml |
|
Thanks @kitaekatt, did you mean to mark the PR as draft? |
I have been doing additional testing and validation, let me wrap that up! But if you want the fix now feel free to change the status to open or if you can't do that I can do so. |
|
I'm happy to wait for your testing to be complete |
Summary
Add
attn_logit_softcappingextraction to GGUF config mapping for Gemma2 and Gemma3 architectures.Problem
When loading Gemma2/Gemma3 GGUF models, the
attn_logit_softcappingparameter is not extracted from GGUF metadata. This causes models to use the default value instead of the actual value stored in the GGUF file.This parameter is critical for attention score scaling and affects model output quality. The llama.cpp GGUF exporter stores this value in the
attention.logit_softcappingfield, but Transformers' GGUF loader doesn't map it to the HuggingFace config attribute.Changes
"attention.logit_softcapping": "attn_logit_softcapping"togemma2mapping inGGUF_CONFIG_MAPPING"attention.logit_softcapping": "attn_logit_softcapping"togemma3mapping inGGUF_CONFIG_MAPPINGtest_gemma_softcap_config_mappingtest (followstest_deci_config_mappingpattern)Testing
Unit Test Added:
test_gemma_softcap_config_mappingintests/quantization/ggml/test_ggml.pyManual Verification (before/after comparison):
Related
This fix enables proper GGUF model loading in downstream projects like vLLM that rely on Transformers' GGUF config extraction.