[Bug]: meta-llama/Llama-4-Scout-17B-16E-Instruct compatibility

### Your current environment

Error while deploying the LLM

### 🐛 Describe the bug

Loading safetensors checkpoint shards:  96% Completed | 48/50 [00:03<00:00, 25.21it/s]
Loading safetensors checkpoint shards: 100% Completed | 50/50 [00:03<00:00, 13.83it/s]
(VllmWorker rank=0 pid=2770071) 
(VllmWorker rank=0 pid=2770071) INFO 04-08 16:26:46 [loader.py:447] Loading weights took 177.20 seconds
(VllmWorker rank=0 pid=2770071) INFO 04-08 16:26:47 [gpu_model_runner.py:1273] Model loading took 53.1198 GiB and 178.255213 seconds
(VllmWorker rank=1 pid=2770083) INFO 04-08 16:26:49 [loader.py:447] Loading weights took 180.12 seconds
(VllmWorker rank=2 pid=2770173) INFO 04-08 16:26:49 [loader.py:447] Loading weights took 180.84 seconds
(VllmWorker rank=1 pid=2770083) INFO 04-08 16:26:49 [gpu_model_runner.py:1273] Model loading took 53.1198 GiB and 181.444976 seconds
(VllmWorker rank=2 pid=2770173) INFO 04-08 16:26:49 [gpu_model_runner.py:1273] Model loading took 53.1198 GiB and 182.203734 seconds
(VllmWorker rank=3 pid=2770198) INFO 04-08 16:26:49 [loader.py:447] Loading weights took 181.73 seconds
(VllmWorker rank=3 pid=2770198) INFO 04-08 16:26:50 [gpu_model_runner.py:1273] Model loading took <omitting python frames>

  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f8f1ef6c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f8f1ef15a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f8f1f355918 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x103ad78 (0x7f8ecd065d78 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x10433c5 (0x7f8ecd06e3c5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x6417b2 (0x7f8f168d07b2 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6f30f (0x7f8f1ef4d30f in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f8f1ef4633b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f8f1ef464e9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x8fefb8 (0x7f8f16b8dfb8 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2f6 (0x7f8f16b8e306 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x13c035 (0x7f8e5f4a5035 in /usr/local/lib/python3.10/dist-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>

  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f8f1ef6c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f8f1ef15a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f8f1f355918 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x103ad78 (0x7f8ecd065d78 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x10433c5 (0x7f8ecd06e3c5 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x6417b2 (0x7f8f168d07b2 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6f30f (0x7f8f1ef4d30f in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f8f1ef4633b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f8f1ef464e9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x8fefb8 (0x7f8f16b8dfb8 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2f6 (0x7f8f16b8e306 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x13c035 (0x7f8e5f4a5035 in /usr/local/lib/python3.10/dist-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: meta-llama/Llama-4-Scout-17B-16E-Instruct compatibility #16330

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: meta-llama/Llama-4-Scout-17B-16E-Instruct compatibility #16330

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions