Skip to content

Conversation

@zhewenl
Copy link
Collaborator

@zhewenl zhewenl commented Oct 23, 2025

Purpose

test_cpu_gpu.py is added in #21448, while it add different attention backends to test, some backends might not be compatible with platforms - like FlashInferBackend is not supported in ROCM at the current moment.

This PR refactors the test to conditional import backends, like what we did in test_attention_backends.py.

Test Plan

pytest -v -s v1/kv_offload/test_cpu_gpu.py
INFO 10-22 21:15:25 [__init__.py:225] Automatically detected platform rocm.
================================================================= test session starts =================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /home/zhewenli/uv_env/vllm-fork/bin/python3
cachedir: .pytest_cache
rootdir: /data/users/zhewenli/gitrepos/vllm-fork
configfile: pyproject.toml
plugins: anyio-4.11.0, asyncio-1.2.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... WARNING 10-22 21:15:43 [interface.py:512] Current platform cuda does not have '__test__' attribute.
WARNING 10-22 21:15:43 [interface.py:512] Current platform cuda does not have '__bases__' attribute.
WARNING 10-22 21:15:43 [interface.py:512] Current platform cuda does not have '__test__' attribute.
collected 4 items                                                                                                                                     

v1/kv_offload/test_cpu_gpu.py::test_transfer[cuda:0-0-dtype0-4-256-64-1-16-8-64-3-True] INFO 10-22 21:15:43 [cpu_gpu.py:78] Allocating 4 CPU tensors...
PASSED
v1/kv_offload/test_cpu_gpu.py::test_transfer[cuda:0-0-dtype0-4-256-64-1-16-8-64-3-False] INFO 10-22 21:15:44 [cpu_gpu.py:78] Allocating 4 CPU tensors...
PASSED
v1/kv_offload/test_cpu_gpu.py::test_transfer[cuda:0-0-dtype0-4-256-64-3-16-8-64-3-True] INFO 10-22 21:15:44 [cpu_gpu.py:78] Allocating 4 CPU tensors...
PASSED
v1/kv_offload/test_cpu_gpu.py::test_transfer[cuda:0-0-dtype0-4-256-64-3-16-8-64-3-False] INFO 10-22 21:15:45 [cpu_gpu.py:78] Allocating 4 CPU tensors...
PASSED

================================================================== warnings summary ===================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================ 4 passed, 2 warnings in 2.86s ============================================================

Signed-off-by: zhewenli <zhewenli@meta.com>
@mergify mergify bot added the v1 label Oct 23, 2025
@zhewenl zhewenl marked this pull request as ready for review October 23, 2025 04:51
@zhewenl zhewenl changed the title update tests [CI/Build] Fix AMD CI: test_cpu_gpu.py Oct 23, 2025
@mergify mergify bot added the rocm Related to AMD ROCm label Oct 23, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
BACKENDS_TO_TEST = [FlashAttentionBackend]

try:
if current_platform.is_cuda():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the previous logic for all non-ROCm platforms, to avoid changing the logic prior to this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 sounds good, updated - since I am working on checking AMD CI in general, I wonder if we want to use current_platform to gate CUDA specific kernels/features? The context is I found there are many tests with assumption that they will be running on CUDA platforms, thus these test are failing on AMD.

For example, ROCM uses TritonAttentionImpl as default, which doesn't support models with encoder(like openai/whisper)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: zhewenli <zhewenli@meta.com>
@zhewenl zhewenl added ci/build ci-failure Issue about an unexpected test failure in CI labels Oct 23, 2025
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) October 23, 2025 05:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025
@yeqcharlotte yeqcharlotte merged commit 50b788a into vllm-project:main Oct 23, 2025
24 checks passed
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
845473182 pushed a commit to raindaywhu/vllm that referenced this pull request Oct 24, 2025
…o step_forward

* 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits)
  [Model] Add MoE support for NemotronH (vllm-project#25863)
  [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245)
  [CI] Reorganize entrypoints tests (vllm-project#27403)
  add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525)
  [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388)
  [Bugfix] Fix args settings for guided decoding args (vllm-project#27375)
  [CI/Build] Fix Prithvi plugin test (vllm-project#27393)
  [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372)
  [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378)
  [V1][spec decode] return logprobs for spec decoding (vllm-project#26060)
  [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219)
  [Bugfix][Core] running queue index leakage exception (vllm-project#26754)
  [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133)
  [Bugfix] Fix SLA tuner initialization (vllm-project#27355)
  [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361)
  [MLA] Bump FlashMLA (vllm-project#27354)
  [Chore] Separate out system utilities from vllm.utils (vllm-project#27201)
  [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128)
  [Feature] publisher default set zmq in kv_event config (vllm-project#26915)
  [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211)
  ...
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ci-failure Issue about an unexpected test failure in CI ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants