-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Fix: Unexpected Keys, Improve run_compressed, Rename Test Folder
#37077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
brian-dellabetta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the other changes in #36152 are all still on main, so this looks good to me!
run_compressed Handling for Compressed Tensors Modelsrun_compressed, Rename Test Folder
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just a nit
| logger.warn( | ||
| "`run_compressed` is only supported for quantized_compressed models" | ||
| " and not for sparsified models. Setting `run_compressed=False`" | ||
| ) | ||
| self.run_compressed = False | ||
| elif self.is_quantized and not self.is_quantization_compressed: | ||
| logger.warn("`run_compressed` is only supported for compressed models.Setting `run_compressed=False`") | ||
| self.run_compressed = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do these check in the config post_init method ? It will be better I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
friendly ping @rahul-tuli |
…e models with run_compressed=True
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
0ee5778 to
8dd325f
Compare
Apologies for the delayed response — I was investigating a weird warning that showed up after moving some logic to The issue is that when we instantiate the model = AutoModelForCausalLM.from_pretrained(
stub,
torch_dtype="auto",
device_map="auto",
quantization_config=CompressedTensorsConfig(run_compressed=run_compressed),
)…the The fix was to move the The diff should be good to go now! |
dsikka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should verify serialization and deserialization with the condition change
dsikka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM given we've tested:
- Compressed
- Quantized, not saved compressed
- Sparse-only
- Sparse + Quantized
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Just a nit
In the latest release, the removal of
unexpected_keysfiltering for compressed tensors models reintroduced warnings that were previously resolved in #36152. This PR addresses that regression, enhances the user experience for therun_compressedflag, and updates the test folder naming to avoid conflicts and align with conventions.Changes and Objectives
This pull request accomplishes three key improvements:
Restores Filtering of Unexpected Keys for Compressed Tensors Models
unexpected_keysfiltering caused warnings to reappear when loading compressed tensors models. This PR reintroduces the necessary logic by addingunexpected_keys = hf_quantizer.update_unexpected_keys(model, unexpected_keys, prefix)inmodeling_utils.py. This ensures unexpected keys are properly managed during model loading, eliminating warnings and restoring the behavior from #36152.Enhances User Experience for
run_compressedMisconfigurationrun_compressed=Truein unsupported cases (e.g., sparsified models or non-compressed quantized models) triggered aValueErrorand halted execution. This PR improves this by:quantizer_compressed_tensors.pyto identify unsupported scenarios (is_sparsification_compressedoris_quantizedwithoutis_quantization_compressed).logger.warnmessage instead of raising an error, notifying users thatrun_compressedis unsupported for the given model type.run_compressed=Falsein these cases, allowing the process to proceed gracefully.Renames Test Folder to Avoid Name Collisions
tests/quantization/compressed_tensorshas been renamed totests/quantization/compressed_tensors_integration. This prevents potential name collisions when runningpytest, ensuring smoother test execution. The new name also aligns with the naming conventions of other integration tests in the repository, improving consistency.Impact
run_compressed, making the library more robust and user-friendly.Files Modified
src/transformers/modeling_utils.py: Addedupdate_unexpected_keyscall to restore filtering.src/transformers/quantizers/quantizer_compressed_tensors.py: Updatedrun_compressedlogic with warnings and overrides.tests/quantization/compressed_tensors/*: Renamed folder tocompressed_tensors_integration(including__init__.py,test_compressed_models.py, andtest_compressed_tensors.py).Absolutely! Here's the full markdown with both the script and the output wrapped in collapsible
<details>blocks for a clean and structured PR description:Local Testing
This test verifies that compressed and uncompressed models can be loaded using
AutoModelForCausalLMwith variousrun_compressedsettings. It also surfaces any warnings, decompression events, or fallbacks.Loading Script
Test Output
Cases when
run_compressed=Trueis not supported and overridden toFalserun_compressedset to Falserun_compressedset to Falserun_compressedset to Falserun_compressedset to Falserun_compressedset to False