-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
Description
Your current environment
Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
🐛 Describe the bug
running Llama-3.3-70B-Instruct-FP8-Dynamic with vllm-0.6.6 , and the server crashed randomly, does anyone has encountered the same problem? And here is the error log:
�[1;36m(VllmWorkerProcess pid=210)�[0;0m WARNING 04-08 01:00:08 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
�[1;36m(VllmWorkerProcess pid=210)�[0;0m WARNING 04-08 01:00:08 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
�[1;36m(VllmWorkerProcess pid=210)�[0;0m WARNING 04-08 01:00:08 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
�[1;36m(VllmWorkerProcess pid=210)�[0;0m WARNING 04-08 01:00:08 model_runner_base.py:143] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
�[1;36m(VllmWorkerProcess pid=210)�[0;0m WARNING 04-08 01:00:08 model_runner_base.py:143]
INFO 04-08 01:00:08 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250408-010008.pkl...
WARNING 04-08 01:00:08 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 04-08 01:00:08 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 04-08 01:00:08 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 04-08 01:00:08 model_runner_base.py:143] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
WARNING 04-08 01:00:08 model_runner_base.py:143]
ERROR 04-08 01:00:08 engine.py:135] RuntimeError('Error in model execution: Error Internal')
ERROR 04-08 01:00:08 engine.py:135] Traceback (most recent call last):
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 04-08 01:00:08 engine.py:135] return func(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 2028, in execute_model
ERROR 04-08 01:00:08 engine.py:135] hidden_or_intermediate_states = model_executable(
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 04-08 01:00:08 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 04-08 01:00:08 engine.py:135] return forward_call(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 578, in forward
ERROR 04-08 01:00:08 engine.py:135] model_output = self.model(input_ids, positions, kv_caches,
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 168, in call
ERROR 04-08 01:00:08 engine.py:135] return self.forward(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 366, in forward
ERROR 04-08 01:00:08 engine.py:135] hidden_states, residual = layer(positions, hidden_states,
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 04-08 01:00:08 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 04-08 01:00:08 engine.py:135] return forward_call(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 279, in forward
ERROR 04-08 01:00:08 engine.py:135] hidden_states = self.self_attn(positions=positions,
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 04-08 01:00:08 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 04-08 01:00:08 engine.py:135] return forward_call(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 204, in forward
ERROR 04-08 01:00:08 engine.py:135] qkv, _ = self.qkv_proj(hidden_states)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 04-08 01:00:08 engine.py:135] return self._call_impl(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in call_impl
ERROR 04-08 01:00:08 engine.py:135] return forward_call(*args, **kwargs)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 376, in forward
ERROR 04-08 01:00:08 engine.py:135] output_parallel = self.quant_method.apply(self, input, bias)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 511, in apply
ERROR 04-08 01:00:08 engine.py:135] return scheme.apply_weights(layer, x, bias=bias)
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py", line 135, in apply_weights
ERROR 04-08 01:00:08 engine.py:135] return apply_fp8_linear(
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 120, in apply_fp8_linear
ERROR 04-08 01:00:08 engine.py:135] output = ops.cutlass_scaled_mm(qinput,
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 493, in cutlass_scaled_mm
ERROR 04-08 01:00:08 engine.py:135] torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
ERROR 04-08 01:00:08 engine.py:135] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1116, in call
ERROR 04-08 01:00:08 engine.py:135] return self._op(*args, **(kwargs or {}))
ERROR 04-08 01:00:08 engine.py:135] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 01:00:08 engine.py:135] RuntimeError: Error Internal
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.