[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 #26501

csy1204 · 2025-10-09T15:43:28Z

Purpose

resolve #26500, vllm-project/compressed-tensors#468

Test Plan

python -m pytest tests/quantization/test_compressed_tensors.py -vvv

Test Result

tests/quantization/test_compressed_tensors.py result

 python -m pytest tests/quantization/test_compressed_tensors.py -vvv
/workspace/storage/cephrbd/git/study/csy-vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.10, pytest-8.3.5, pluggy-1.5.0 -- /workspace/storage/cephrbd/git/study/csy-vllm/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/storage/cephrbd/git/study/csy-vllm/.hypothesis/examples'))
rootdir: /workspace/storage/cephrbd/git/study/csy-vllm
configfile: pyproject.toml
plugins: subtests-0.14.1, hypothesis-6.131.0, shard-0.1.2, buildkite-test-collector-0.1.9, mock-3.14.0, cov-6.3.0, schemathesis-3.39.15, rerunfailures-14.0, forked-1.6.0, timeout-2.3.1, hydra-core-1.3.2, asyncio-0.24.0, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 34 items                                                                                                                                                                                       
Running 34 items in this shard: tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args3], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args3], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args4], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args5], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a16_marlin24, tests/quantization/test_compressed_tensors.py::test_compressed_tensors_kv_cache, tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of43], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of43], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse[nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse_compressed[nm-testing/llama2.c-stories42M-pruned2.4-compressed], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a8_fp8[args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-quip-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled

tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args0] PASSED                                                                                       [  2%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args1] PASSED                                                                                       [  5%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args2] PASSED                                                                                       [  8%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args0] PASSED                                                                            [ 11%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args1] PASSED                                                                            [ 14%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args2] PASSED                                                                            [ 17%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args3] PASSED                                                                            [ 20%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args0] PASSED                                                                                                   [ 23%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args1] PASSED                                                                                                   [ 26%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args2] PASSED                                                                                                   [ 29%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args3] PASSED                                                                                                   [ 32%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args4] PASSED                                                                                                   [ 35%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args5] PASSED                                                                                                   [ 38%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a16_marlin24 PASSED                                                                                                       [ 41%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_kv_cache SKIPPED (FP8 KV cache is not supported on this device.)                                                            [ 44%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of40] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 47%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of41] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 50%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of42] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 52%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of43] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 55%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of40] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 58%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of41] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 61%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of42] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 64%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of43] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 67%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of40] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 70%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of41] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 73%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of42] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 76%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse[nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor] SKIPPED (2of4 Sparse is not yet supported on
this GPU type.)                                                                                                                                                                                    [ 79%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse_compressed[nm-testing/llama2.c-stories42M-pruned2.4-compressed] SKIPPED (Cutlass is not yet supported on this
GPU type.)                                                                                                                                                                                         [ 82%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args0] PASSED                                                                                                         [ 85%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args1] PASSED                                                                                                         [ 88%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a8_fp8[args0] SKIPPED (W4A8 FP8 is not yet supported on this GPU type.)                                                   [ 91%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0] PASSED [ 94%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-quip-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0] PASSED [ 97%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled PASSED                                                                                                    [100%]

============================================================================================ warnings summary ============================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /workspace/storage/cephrbd/git/study/csy-vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= 19 passed, 15 skipped, 1 warning in 993.61s (0:16:33) ==========================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request upgrades the compressed-tensors dependency from version 0.11.0 to 0.12.2 to address a licensing concern. While this is a straightforward version bump in requirements/common.txt, it's important to ensure that there are no regressions or latent bugs in the integration. My review of the related code has uncovered two critical bugs in the compressed-tensors quantization logic that could lead to runtime errors or incorrect behavior. I've detailed these issues in a comment on the requirements file change. It is highly recommended to fix these issues as part of this PR to improve the robustness of the quantization functionality.

requirements/common.txt

…ency issue Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

gemini-code-assist

Code Review

This pull request upgrades the compressed-tensors library from version 0.11.0 to 0.12.2, primarily to address a licensing concern. The associated code changes are minimal and correctly adapt the codebase to the new version of the dependency. The modifications in compressed_tensors.py and compressed_tensors_moe.py adjust enum comparisons, which is a common requirement during library upgrades and also fixes a likely pre-existing bug. The changes are sound and the pull request is ready for merging.

csy1204 · 2025-10-09T16:35:07Z

requirements/common.txt

 setuptools>=77.0.3,<80; python_version > '3.11' # Setuptools is used by triton, we need to ensure a modern version is installed for 3.12+ so that it does not try to import distutils, which was removed in 3.12
 einops # Required for Qwen2-VL.
-compressed-tensors == 0.11.0 # required for compressed-tensors
+compressed-tensors == 0.12.2 # required for compressed-tensors


This version fixes the issue that occurred in the previous PR

PR: bumped compressed-tensors version up to 0.12.1 #26173, #ci)

Fixed PR [Transform] Fix accelerate import to keep it as optional dependency compressed-tensors#480

[2025-10-03T16:37:27Z] ERROR entrypoints/openai/test_response_api_with_harmony.py::test_basic[openai/gpt-oss-20b] - ImportError: cannot import name 'has_offloaded_params' from 'accelerate.utils'

csy1204 · 2025-10-13T16:32:07Z

@mgoin All CI checks are green. When you get a chance, could you kindly review this PR? Thank you!

mgoin

Thank you @csy1204 !

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

mergify bot added the ci/build label Oct 9, 2025

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

requirements/common.txt Show resolved Hide resolved

csy1204 changed the title ~~chore: upgrade compressed-tensors to 0.11.1+ to address LGPLv3~~ chore: upgrade compressed-tensors to 0.12.2 to address LGPLv3 Oct 9, 2025

csy1204 requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 9, 2025 16:12

csy1204 added 3 commits October 9, 2025 16:18

chore: upgrade compressed-tensors to 0.11.1+ to address LGPLv3 depend…

2867616

…ency issue Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

Update compressed_tensors.py

a40da27

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

Update compressed_tensors_moe.py

973a4e0

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

csy1204 force-pushed the patch-2 branch from d0c0707 to 973a4e0 Compare October 9, 2025 16:18

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

csy1204 commented Oct 9, 2025

View reviewed changes

csy1204 changed the title ~~chore: upgrade compressed-tensors to 0.12.2 to address LGPLv3~~ [CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 Oct 9, 2025

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025

csy1204 added 3 commits October 10, 2025 05:38

Merge branch 'main' into patch-2

7082281

Merge branch 'vllm-project:main' into patch-2

ab031c6

Merge branch 'main' into patch-2

5358d0c

mgoin approved these changes Oct 13, 2025

View reviewed changes

mgoin merged commit a1b2d65 into vllm-project:main Oct 13, 2025
84 checks passed

csy1204 deleted the patch-2 branch October 13, 2025 17:00

1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

b4d55dc

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

a54004c

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

cce8517

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

22d5eed

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

2bafe35

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

be7a830

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (vl…

ddd699c

…lm-project#26501) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>

GauthierRoy mentioned this pull request Dec 18, 2025

[Bug]: SmolLM3-3B FP8 models fail to load in v0.11.1 with "Unable to find matching target" error in compressed-tensors config #30969

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 #26501

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 #26501

Uh oh!

csy1204 commented Oct 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

csy1204 Oct 9, 2025

Uh oh!

csy1204 commented Oct 13, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 #26501

[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 #26501

Uh oh!

Conversation

csy1204 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

csy1204 Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

csy1204 commented Oct 13, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csy1204 commented Oct 9, 2025 •

edited

Loading