[Bug]: Building vLLM with CUDA 13.0

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
PyTorch version: 2.9.0.dev20250908+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: AlmaLinux 8.10 (Cerulean Leopard) (x86_64)
GCC version: (GCC) 13.3.1 20240611 (Red Hat 13.3.1-2)
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.28

Python version: 3.12.11 (main, Aug 26 2025, 23:21:03) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] (64-bit runtime)
Python platform: Linux-6.1.141-155.222.amzn2023.x86_64-x86_64-with-glibc2.28
Is CUDA available: False
CUDA runtime version: 13.0.48
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:            7
CPU MHz:             3100.016
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

Versions of relevant libraries:
[pip3] numpy==2.3.2
[pip3] nvidia-cublas==13.0.0.19
[pip3] nvidia-cuda-cupti==13.0.48
[pip3] nvidia-cuda-nvrtc==13.0.48
[pip3] nvidia-cuda-runtime==13.0.48
[pip3] nvidia-cudnn-cu13==9.13.0.50
[pip3] nvidia-cufft==12.0.0.15
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.3.29
[pip3] nvidia-cusparse==12.6.2.49
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.27.7
[pip3] nvidia-nvjitlink==13.0.39
[pip3] nvidia-nvtx==13.0.39
[pip3] pytorch-triton==3.4.0+gitf7888497
[pip3] torch==2.9.0.dev20250908+cu130
[pip3] torchaudio==2.8.0.dev20250908+cu130
[pip3] torchvision==0.24.0.dev20250908+cu130
```
</details>


### 🐛 Describe the bug

Trying to build vLLM with CUDA 13.0 is currently failing because of some breaking changes from https://nvidia.github.io/cccl/cccl/3.0_migration_guide.html:

```
#34 62.31 /workspace/csrc/layernorm_kernels.cu(33): error: namespace "cub" has no member "Sum"
#34 62.31     variance = BlockReduce(reduceStore).Reduce(variance, cub::Sum{}, blockDim.x);
```

Here is an example build failure from PyTorch CI https://github.com/pytorch/pytorch/actions/runs/17510237984/job/49740804047.  With the introduction of CUDA 13.0 support in PyTorch 2.9 https://dev-discuss.pytorch.org/t/pytorch-release-2-9-0-key-dates/3178, vLLM would need to be updated to work with CUDA 13.0 to unblock the use of PyTorch 2.9.

For reference, the same breaking changes have been fixed on PyTorch in https://github.com/pytorch/pytorch/pull/153373

cc @simon-mo @youkaichao @atalman @Aidyn-A @zou3519 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bug]: Building vLLM with CUDA 13.0 #24464

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Bug]: Building vLLM with CUDA 13.0 #24464

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions