[Transform] [Quantization] Add transforms to compressed tensors #22486

kylesayrs · 2025-08-08T03:08:42Z

Online Hadamard Rotations

Purpose

Support online linear transforms for models compressed with CompressedTensors

Changes

Added Transforms Weight Loading

Implemented SharedWeightParameter
- This class implements a registry where tensors can be shared. Loading with shared tensors is required for models with many transforms such as QuIP
Before weight loading, a HadamardTransform module is attached to linear layers to load transform weights which utilizes SharedWeightParameter in order to load weight partitions as separate tensors

Added Transforms Apply

Add CompressedTensorsLinearTransformMethod which wraps CompressedTensorsLinearMethod and UnquantizedLinearMethod adds input and output transforms to either side of the original apply method
- Unfortunately, because [Core] Support weight_loader_v2 for UnquantizedLinearMethod #23036 has not landed, we must use a hack to switch back to weight_loader_v1 if the method being wrapped is the UnquantizedLinearMethod
- This class has template functionality for selecting which kernel to use when applying transforms

Misc Changes

Add transform config parsing to CompressedTensors
Define _shard_id_as_int on BasevLLMParameter. This is so that its implementation can be shared with SharedWeightParameter
Add test utility calculate_prompt_perplexity for checking model coherence
Added weight_loader_v1 to QKVCrossParallelLinear to support hack described above (and [Core] Support weight_loader_v2 for UnquantizedLinearMethod #23036)

Testing

Added test_compressed_tensors_transforms_perplexity tests for SpinQuantR1R2R4 and QuIP
Locally tested unquantized QuIP and SpinQuantR1R2R4 models

gemini-code-assist

Code Review

This pull request introduces support for transforms within the compressed tensors quantization framework. The changes primarily involve updating CompressedTensorsConfig to manage transform configurations and modifying the linear method to apply these transforms during the forward pass. My review has identified two critical issues that need to be addressed. First, the logic for creating transform factories incorrectly overwrites the factories dictionary within a loop, which will lead to incorrect behavior when multiple transform schemes are present. Second, there is a signature mismatch in the call to the newly added is_match function, which will cause a TypeError at runtime.

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py

github-actions · 2025-08-08T03:27:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-08-20T20:14:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

yewentao256

Let's run CI and see if it is correct

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

…s won't Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

mergify · 2025-08-27T01:31:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tests/conftest.py

vllm/model_executor/parameter.py

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

## Purpose ## * Support R4 transforms before R3. R3 requires hooking into the attention module, where as R4 does not ## Prerequisites ## * vllm-project/vllm#22486 ## Testing ## * Performed sanity checks with HF and vLLM --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

…-project#22486)

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py Outdated Show resolved Hide resolved

mergify bot added the llama Related to Llama models label Aug 10, 2025

kylesayrs mentioned this pull request Aug 14, 2025

[Transform] Better dispatch support for offloaded and multi-gpu vllm-project/compressed-tensors#423

Merged

kylesayrs force-pushed the kylesayrs/transforms branch 2 times, most recently from d91886d to 43016eb Compare August 16, 2025 01:09

This was referenced Aug 16, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod #23036

Merged

[Transform] SpinQuant R4 vllm-project/llm-compressor#1746

Merged

mergify bot added the ci/build label Aug 18, 2025

mergify bot added the needs-rebase label Aug 20, 2025

kylesayrs added 5 commits August 21, 2025 18:29

implement transforms

d2d057a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

reduce diff

924bc8d

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

reduce diff

58d73b0

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

reduce diff

a4d0f0c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

use hack to support unquantized

5954ccc

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/transforms branch from 496527e to 5954ccc Compare August 21, 2025 22:30

mergify bot removed the needs-rebase label Aug 21, 2025

kylesayrs added 2 commits August 21, 2025 19:11

fix type issues

c54ceb2

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'main' into kylesayrs/transforms

b07b5e2

yewentao256 reviewed Aug 22, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025

kylesayrs marked this pull request as ready for review August 22, 2025 14:16

kylesayrs requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners August 22, 2025 14:16

kylesayrs added 4 commits August 22, 2025 10:28

Merge branch 'main' into kylesayrs/transforms

6816b6f

reduce diff

6af9cc9

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix type

9cc756e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add tests

2c1835a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

refactor to SharedWeightParameter, hold strong refs where model param…

b24b9ef

…s won't Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

mergify bot added the needs-rebase label Aug 27, 2025

Merge remote-tracking branch 'upstream/main' into kylesayrs/transforms

a93660c

mergify bot removed the needs-rebase label Aug 27, 2025

mgoin reviewed Aug 27, 2025

View reviewed changes

kylesayrs and others added 3 commits August 27, 2025 04:39

docstrings

2510149

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Merge remote-tracking branch 'upstream/main' into kylesayrs/transforms

8f02e3b

Merge branch 'main' into kylesayrs/transforms

ce396a6

Merge branch 'main' into kylesayrs/transforms

0d7f257

mgoin approved these changes Aug 28, 2025

View reviewed changes

mgoin merged commit 22feac8 into vllm-project:main Aug 28, 2025
43 checks passed

kylesayrs deleted the kylesayrs/transforms branch August 28, 2025 14:58

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Transform] [Quantization] Add transforms to compressed tensors (vllm…

dc5cc66

…-project#22486)

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[Transform] [Quantization] Add transforms to compressed tensors (vllm…

9e6d7b8

…-project#22486)

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Transform] [Quantization] Add transforms to compressed tensors (vllm…

6b8b88b

…-project#22486)

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Transform] [Quantization] Add transforms to compressed tensors (vllm…

9f47691

…-project#22486)

This was referenced Nov 12, 2025

online_rotations #15162

Closed

[RFC]: 4-bit KV cache quantization through Hadamard transforms #28538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Transform] [Quantization] Add transforms to compressed tensors #22486

[Transform] [Quantization] Add transforms to compressed tensors #22486

Uh oh!

kylesayrs commented Aug 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 20, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

mergify bot commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Transform] [Quantization] Add transforms to compressed tensors #22486

[Transform] [Quantization] Add transforms to compressed tensors #22486

Uh oh!

Conversation

kylesayrs commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Online Hadamard Rotations

Purpose

Changes

Added Transforms Weight Loading

Added Transforms Apply

Misc Changes

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 20, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kylesayrs commented Aug 8, 2025 •

edited by github-actions bot

Loading