Improve model loading for compressed tensor models #36152

rahul-tuli · 2025-02-12T15:31:02Z

This PR improves the from_pretrained model-loading pipeline in Hugging Face transformers by suppressing unnecessary warnings for models saved using compressed tensors (e.g., sparse and quantized models).

Currently, models that store compression metadata (e.g., bitmask, compressed, row_offsets) instead of direct weight tensors trigger misleading warnings, Due to our compression method, certain extra and missing keys are expected:

Extra keys: Represent compression metadata stored on disk but not explicitly expected by the model graph.
Missing keys: Expected in the model graph but omitted from storage since they can be reconstructed from compression metadata.

This PR fixes these issues by introducing key filtering mechanisms through Hugging Face’s hf_quantizer extension points.

Problem: Unnecessary Warnings for Compressed Models

🚨 Current Behavior (Before Fix)

When loading compressed models, users encounter:

❌ "Missing key" warnings for weight tensors that will be reconstructed.
❌ "Unexpected key" warnings for compression metadata in the checkpoint.

✅ Expected Behavior (After Fix)

✅ No unnecessary missing key warnings for compressed weights.
✅ No unnecessary unexpected key warnings for compression metadata.
✅ Seamless integration with Hugging Face’s model-loading process.

Solution: Key Filtering for Compressed Models

🔹 Step 1: Suppress Warnings for Expected Missing Keys

🔧 Fix:

update_missing_keys_after_loading() removes weight keys (e.g., .*weight) from missing_keys.
Since these weights are reconstructed later, the warning is unnecessary.

💡 Impact: No more misleading "missing key" warnings for compressed models.

🔹 Step 2: Suppress Warnings for Expected Unexpected Keys

🔧 Fix:

update_unexpected_keys() removes compression parameters (e.g., bitmask, compressed, row_offsets) from unexpected_keys.
These are metadata, not actual model weights, and should not raise warnings.

💡 Impact: No more misleading "unexpected key" warnings from compression metadata.

🔹 Step 3: Seamless Integration with Model Loading

✅ Uses existing Hugging Face extension points (hf_quantizer)—no changes to core transformers code.
✅ Standard models remain unaffected—only compressed models benefit.
✅ Ensures genuine issues (e.g., truly missing parameters) still raise warnings.

Lifecycle Overview: Before and After the Fix

Step	Before (Current Behavior) ❌	After (With Fix) ✅
6	Load checkpoint (compressed weights missing!)	Load checkpoint (compressed weights missing!)
7	Identify missing keys ⚠ (Triggers Warning!)	Identify missing keys (Before Filtering) ✅
8	Identify unexpected keys ⚠ (Triggers Warning!)	Filter missing keys (Removes .*weight) ✅
9	Raise unnecessary warnings ❌	Identify unexpected keys (Before Filtering) ✅
10	Assign weights (weights missing!)	Filter unexpected keys (Removes compression params) ✅
14	Set model to evaluation mode (with unnecessary warnings)	Apply quantization postprocessing (Reconstructs weights) ✅

Testing

This PR has been tested across multiple model configurations to ensure correctness:

✅ Quantized-only models – No unexpected warnings.
✅ Sparse-only models – No missing or unexpected key warnings.
✅ Stacked cases (both sparse and quantized) – Loads correctly without unnecessary warnings.

Why This Should Be Merged

🚀 Fixes a real issue affecting compressed models.
✅ No impact on standard model-loading workflows.
🔧 Uses Hugging Face’s extension points—no core code changes.
🔄 Maintains extensibility while improving compressed model support.

Dependencies

This PR depends on neuralmagic/compressed-tensors#250, which introduces the necessary filtering mechanisms at the compression framework level.

Conclusion

With this fix: ✅ Expected missing keys (e.g., .*weight) do not trigger warnings.
✅ Known compression parameters do not raise unexpected key warnings.
✅ Weights are reconstructed properly in postprocessing.
✅ The Hugging Face from_pretrained API remains modular and extensible.

This significantly improves the developer experience when working with compressed models. Looking forward to feedback and merging this in! 🚀

src/transformers/quantizers/quantizer_compressed_tensors.py

Rocketknight1 · 2025-02-13T14:31:14Z

cc @SunMarc @MekkCyber for quantization!

HuggingFaceDocBuilderDev · 2025-02-14T15:26:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

brian-dellabetta

👍

SunMarc

Thanks for the detailed description of the PR ! Left some minor comments

SunMarc · 2025-02-17T15:38:14Z

src/transformers/quantizers/quantizer_compressed_tensors.py

+        # We expect some keys to be missing for
+        # compresed models
+        # This is fine as the weights are reconstructed by ModelCompressor
+        # in _process_model_after_weight_loading
+
+        expected_missing_keys = self.compressor.get_missing_module_keys(model)
+        return [
+            key for key in missing_keys if not any(re.match(f".*{pattern}", key) for pattern in expected_missing_keys)
+        ]


could you explain why we can't do this step with update_missing_keys and it needs to be done after loading the weights ? Also, I see that you do something similar with the unexpected_keys but this is done prior to loading the weights.

The key reason we update missing keys (e.g., weights) after loading is that compressed-tensors' decompression depends on the correct device placement from transformers. If we filter out the weight tensors too early (before loading), they will still be on the meta device during decompression, which breaks the pre-condition for reconstruction. This can lead to issues when trying to restore the weights later.

On the other hand, unexpected keys (compression metadata) are not actual model parameters but are only used for reconstruction, so they do not depend on device placement. This is why we can safely filter them before loading the weights.

tests/quantization/compressed_tensor/test_compressed_models.py

SunMarc · 2025-02-17T16:13:03Z

Also make sure to rebase the PR, this should solve the issue with the tests in the CI

* Introduce two new hooks in HfQuantizer lifecycle to allow updates to missing and unexpected keys * Update missing and unexpected keys for stacked compressors * Add tests * Fix: run_compressed cases * Fix: uncompressed cases

Move RunCompressedTest to the same file Update tests to unittest

SunMarc

Thanks for iterating !

) ## Purpose ## * Remove warning silencing code that was previously needed for loading quantized models but is now handled by huggingface/transformers#36152 --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka mentioned this pull request Feb 12, 2025

Enable: Model Loading for Compressed Tensors neuralmagic/upstream-transformers#2

Closed

SunMarc self-requested a review February 12, 2025 17:17

horheynm reviewed Feb 12, 2025

View reviewed changes

src/transformers/quantizers/quantizer_compressed_tensors.py Show resolved Hide resolved

MekkCyber self-requested a review February 13, 2025 19:54

rahul-tuli mentioned this pull request Feb 14, 2025

Add: missing and unexpected keys in ModelCompressor vllm-project/compressed-tensors#250

Merged

brian-dellabetta approved these changes Feb 14, 2025

View reviewed changes

SunMarc reviewed Feb 17, 2025

View reviewed changes

rahul-tuli added 2 commits February 24, 2025 04:50

Disable warnings for stacked compressors

7d8c954

* Introduce two new hooks in HfQuantizer lifecycle to allow updates to missing and unexpected keys * Update missing and unexpected keys for stacked compressors * Add tests * Fix: run_compressed cases * Fix: uncompressed cases

Rename compressed_tensor folder to compressed_tensors

7eb856c

Move RunCompressedTest to the same file Update tests to unittest

rahul-tuli force-pushed the stacked-compressors branch from abdc743 to 7eb856c Compare February 24, 2025 04:50

SunMarc approved these changes Feb 24, 2025

View reviewed changes

SunMarc merged commit 884a8ea into huggingface:main Feb 24, 2025
21 checks passed

kylesayrs mentioned this pull request Feb 24, 2025

Remove missing weights silencers in favor of HFQuantizer solution vllm-project/llm-compressor#1017

Merged

SunMarc mentioned this pull request Mar 24, 2025

Fix regression compressed-tensors #36921

Closed

rahul-tuli mentioned this pull request Mar 28, 2025

Fix: Unexpected Keys, Improve run_compressed, Rename Test Folder #37077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve model loading for compressed tensor models #36152

Improve model loading for compressed tensor models #36152

Uh oh!

rahul-tuli commented Feb 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Rocketknight1 commented Feb 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 14, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

SunMarc left a comment •

edited

Loading

Uh oh!

SunMarc Feb 17, 2025

Uh oh!

rahul-tuli Feb 24, 2025

Uh oh!

Uh oh!

SunMarc commented Feb 17, 2025

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Improve model loading for compressed tensor models #36152

Improve model loading for compressed tensor models #36152

Uh oh!

Conversation

rahul-tuli commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem: Unnecessary Warnings for Compressed Models

🚨 Current Behavior (Before Fix)

✅ Expected Behavior (After Fix)

Solution: Key Filtering for Compressed Models

🔹 Step 1: Suppress Warnings for Expected Missing Keys

🔹 Step 2: Suppress Warnings for Expected Unexpected Keys

🔹 Step 3: Seamless Integration with Model Loading

Lifecycle Overview: Before and After the Fix

Testing

Why This Should Be Merged

Dependencies

Conclusion

Uh oh!

Uh oh!

Rocketknight1 commented Feb 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 14, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

rahul-tuli Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SunMarc commented Feb 17, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rahul-tuli commented Feb 12, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading