Skip to content

Conversation

@csy1204
Copy link
Contributor

@csy1204 csy1204 commented Oct 9, 2025

Purpose

resolve #26500, vllm-project/compressed-tensors#468

Test Plan

python -m pytest tests/quantization/test_compressed_tensors.py -vvv

Test Result

tests/quantization/test_compressed_tensors.py result
 python -m pytest tests/quantization/test_compressed_tensors.py -vvv
/workspace/storage/cephrbd/git/study/csy-vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.10, pytest-8.3.5, pluggy-1.5.0 -- /workspace/storage/cephrbd/git/study/csy-vllm/.venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/storage/cephrbd/git/study/csy-vllm/.hypothesis/examples'))
rootdir: /workspace/storage/cephrbd/git/study/csy-vllm
configfile: pyproject.toml
plugins: subtests-0.14.1, hypothesis-6.131.0, shard-0.1.2, buildkite-test-collector-0.1.9, mock-3.14.0, cov-6.3.0, schemathesis-3.39.15, rerunfailures-14.0, forked-1.6.0, timeout-2.3.1, hydra-core-1.3.2, asyncio-0.24.0, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 34 items                                                                                                                                                                                       
Running 34 items in this shard: tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args3], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args2], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args3], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args4], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args5], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a16_marlin24, tests/quantization/test_compressed_tensors.py::test_compressed_tensors_kv_cache, tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of43], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of43], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of40], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of41], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of42], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse[nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse_compressed[nm-testing/llama2.c-stories42M-pruned2.4-compressed], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args1], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a8_fp8[args0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-quip-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0], tests/quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled

tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args0] PASSED                                                                                       [  2%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args1] PASSED                                                                                       [  5%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup[model_args2] PASSED                                                                                       [  8%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args0] PASSED                                                                            [ 11%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args1] PASSED                                                                            [ 14%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args2] PASSED                                                                            [ 17%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token[False-model_args3] PASSED                                                                            [ 20%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args0] PASSED                                                                                                   [ 23%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args1] PASSED                                                                                                   [ 26%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args2] PASSED                                                                                                   [ 29%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args3] PASSED                                                                                                   [ 32%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args4] PASSED                                                                                                   [ 35%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_wNa16[wNa16_args5] PASSED                                                                                                   [ 38%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a16_marlin24 PASSED                                                                                                       [ 41%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_kv_cache SKIPPED (FP8 KV cache is not supported on this device.)                                                            [ 44%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of40] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 47%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of41] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 50%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of42] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 52%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_fp8_compressed[args_2of43] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                           [ 55%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of40] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 58%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of41] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 61%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of42] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 64%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8_compressed[args_2of43] SKIPPED (cutlass is not yet supported on this GPU type.)                             [ 67%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of40] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 70%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of41] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 73%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_quant_int8[args_2of42] SKIPPED (Sparse FP8 is not yet supported on this GPU type.)                                     [ 76%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse[nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor] SKIPPED (2of4 Sparse is not yet supported on
this GPU type.)                                                                                                                                                                                    [ 79%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_2of4_sparse_compressed[nm-testing/llama2.c-stories42M-pruned2.4-compressed] SKIPPED (Cutlass is not yet supported on this
GPU type.)                                                                                                                                                                                         [ 82%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args0] PASSED                                                                                                         [ 85%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_nvfp4[args1] PASSED                                                                                                         [ 88%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w4a8_fp8[args0] SKIPPED (W4A8 FP8 is not yet supported on this GPU type.)                                                   [ 91%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0] PASSED [ 94%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_transforms_perplexity[nm-testing/Llama-3.2-1B-Instruct-quip-w4a16-Flat is better than nested.\nSparse is better than dense.-150.0] PASSED [ 97%]
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled PASSED                                                                                                    [100%]

============================================================================================ warnings summary ============================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /workspace/storage/cephrbd/git/study/csy-vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= 19 passed, 15 skipped, 1 warning in 993.61s (0:16:33) ==========================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the ci/build label Oct 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the compressed-tensors dependency from version 0.11.0 to 0.12.2 to address a licensing concern. While this is a straightforward version bump in requirements/common.txt, it's important to ensure that there are no regressions or latent bugs in the integration. My review of the related code has uncovered two critical bugs in the compressed-tensors quantization logic that could lead to runtime errors or incorrect behavior. I've detailed these issues in a comment on the requirements file change. It is highly recommended to fix these issues as part of this PR to improve the robustness of the quantization functionality.

@csy1204 csy1204 changed the title chore: upgrade compressed-tensors to 0.11.1+ to address LGPLv3 chore: upgrade compressed-tensors to 0.12.2 to address LGPLv3 Oct 9, 2025
…ency issue

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the compressed-tensors library from version 0.11.0 to 0.12.2, primarily to address a licensing concern. The associated code changes are minimal and correctly adapt the codebase to the new version of the dependency. The modifications in compressed_tensors.py and compressed_tensors_moe.py adjust enum comparisons, which is a common requirement during library upgrades and also fixes a likely pre-existing bug. The changes are sound and the pull request is ready for merging.

setuptools>=77.0.3,<80; python_version > '3.11' # Setuptools is used by triton, we need to ensure a modern version is installed for 3.12+ so that it does not try to import distutils, which was removed in 3.12
einops # Required for Qwen2-VL.
compressed-tensors == 0.11.0 # required for compressed-tensors
compressed-tensors == 0.12.2 # required for compressed-tensors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version fixes the issue that occurred in the previous PR

[2025-10-03T16:37:27Z] ERROR entrypoints/openai/test_response_api_with_harmony.py::test_basic[openai/gpt-oss-20b] - ImportError: cannot import name 'has_offloaded_params' from 'accelerate.utils' 

@csy1204 csy1204 changed the title chore: upgrade compressed-tensors to 0.12.2 to address LGPLv3 [CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 Oct 9, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025
@csy1204
Copy link
Contributor Author

csy1204 commented Oct 13, 2025

@mgoin All CI checks are green. When you get a chance, could you kindly review this PR? Thank you!

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @csy1204 !

@mgoin mgoin merged commit a1b2d65 into vllm-project:main Oct 13, 2025
84 checks passed
@csy1204 csy1204 deleted the patch-2 branch October 13, 2025 17:00
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…lm-project#26501)

Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vLLM pulls LGPLv3 dependency (frozendict) via compressed-tensors 0.11.0, breaking license allowlists in downstreams (e.g., KServe)

2 participants