Skip to content

Conversation

@kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Dec 15, 2025

Purpose

  • Support loading models with online transforms applied via Compressed Tensors (LLM Compressor)
  • Fix tests which referenced deleted models

Background

Transforms are extra weights added to a model which improve accuracy recovery from quantization. These extra weights are required to be shared in order to reduce memory requirements of the model.

Prerequisites

Changes

  • Require a minimum compressed tensors version of 0.13.1 (to support transform features)
  • Apply transforms to the model before weight loading
    • apply_transform_config implements _update_transforms_tied_weights, which leverages @Cyrilvallez 's refactored tie_weights functionality!

Implemented in CT's apply_transform_config

def _update_transforms_tied_weights(model: torch.nn.Module):
    """
    This function updates the `_tied_weights_keys` and `all_tied_weights_keys`
    attributes of the given model with transform weights.

    This function is needed because transformers only knows which weights are shared
    via the `_tied_weights_keys` attributes. These attributes are used to tie
    weights after the model has loaded.

    CompressedTensors does not enforce a particular weight is the source weight :.
    We rely on correctness of the following mapping in PreTrainedModel.tie_weights():
    ```
    B -> A
    C -> A
    D -> A

    Where any of A,B,C,D might be the loaded source weight
    ```
    This property is tested by `test_modeling_utils::BaseModelWithMultipleTiedWeights`
    """

Example _tied_weights_keys:

"model.layers.1.self_attn_.q_proj.v_input.weight": "model.layers.0.self_attn_.q_proj.v_input.weight",
"model.layers.2.self_attn_.q_proj.v_input.weight": "model.layers.0.self_attn_.q_proj.v_input.weight",
"model.layers.3.self_attn_.q_proj.v_input.weight": "model.layers.0.self_attn_.q_proj.v_input.weight",
...
"model.layers.1.self_attn_.q_proj.u_output.weight": "model.layers.0.self_attn_.q_proj.u_output.weight",
"model.layers.2.self_attn_.q_proj.u_output.weight": "model.layers.0.self_attn_.q_proj.u_output.weight",
"model.layers.3.self_attn_.q_proj.u_output.weight": "model.layers.0.self_attn_.q_proj.u_output.weight",

Testing

  • Regression tested using CompressedTensorsTest, added an online quip-style transformed model for testing
    • Perplexity results match expectation

Suggested Reviewers

@SunMarc @Cyrilvallez @Rocketknight1

@Rocketknight1
Copy link
Member

cc @MekkCyber for quantization

@kylesayrs
Copy link
Contributor Author

make fix-copies does not fix the CI 🥲

@kylesayrs kylesayrs force-pushed the kylesayrs/transforms branch from a65d99e to d56f657 Compare December 16, 2025 15:46
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall LGTM ! Just a few nits

@SunMarc SunMarc requested a review from MekkCyber December 17, 2025 13:23
Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix @kylesayrs

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/transforms branch from d56f657 to 77a0036 Compare December 17, 2025 17:58
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: compressed_tensors_integration

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42887&sha=3a71ff

@kylesayrs
Copy link
Contributor Author

@SunMarc @MekkCyber

Recent changes from #42882 ended up breaking this PR as implemented. This is because calling preprocess within the init_empty_weights context creates meta tensors which are no longer identical. I actually think that this is fine and cleaner behavior, but it does mean that this PR needs to be refactored.

I'm moving _update_transforms_tied_weights into apply_transform_config and refactoring the function to work around init_empty_weights (that's done here). Once CT releases, I'll reopen this PR so we can start supporting transforms in transformers :)

Thanks

@kylesayrs kylesayrs marked this pull request as draft December 17, 2025 20:38
@Cyrilvallez
Copy link
Member

Thanks @kylesayrs! As discussed offline, indeed it won't matter which weight is actually present in the checkpoint for tying them! Letting you with @SunMarc and @MekkCyber for the quantization part!

@SunMarc
Copy link
Member

SunMarc commented Dec 18, 2025

Recent changes from #42882 ended up breaking this PR as implemented. This is because calling preprocess within the init_empty_weights context creates meta tensors which are no longer identical. I actually think that this is fine and cleaner behavior, but it does mean that this PR needs to be refactored.

Oh indeed, as most quants methods calling preprocess is actually calling with_init_empty_weights again, I didn't predict that it will create this kind of issue. Let us know if you manage to fix it ! If there are really no way and it's too breaking, we can revert this. But as you said, I think this behavior is probably better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants