Description
Your current environment
vllm/vllm-openai:v0.9.1

The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
[rank2]:[E626 23:10:17.921506443 ProcessGroupNCCL.cpp:1896] [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe0 (0x7f482bab34a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f489cba5422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f482c88b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f482c89b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f482c89d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f482c89ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
ERROR 06-26 23:10:17 [dump_input.py:69] Dumping input data
what(): [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe0 (0x7f482bab34a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f489cba5422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f482c88b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f482c89b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f482c89d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f482c89ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1902 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: + 0xcc7a4e (0x7f482c86da4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x9165ed (0x7f482c4bc5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #4: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #5: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
ERROR 06-26 23:10:17 [dump_input.py:71] V1 LLM engine (v0.9.1) with config: model='/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-72B-Instruct', speculative_config=None, tokenizer='/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-72B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen2.5-72B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null},
ERROR 06-26 23:10:17 [dump_input.py:79] Dumping scheduler output for model execution:
ERROR 06-26 23:10:17 [dump_input.py:80] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-65e9066118be463f982c07f0b89379b8,prompt_token_ids_len=5304,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=125768, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'description': '文件按照客户维度分类信息', 'properties': {'partyA': {'description': '\n####发票/发票扫描件\n“甲方”:若发票扫描件中有 “甲方名称”“甲方账号””购买方信息“”购买方“ 等表述,可直接识别甲方。\n“付款方”:在结算场景下,付款方通常为甲方。\n内容中如果有2个”统一社会信用代码/纳税人识别号“,第一个包含”统一社会信用代码/纳税人识别号“的一般为甲方。\n\n####货运单据\n“甲方”:若提单中有明确标注。\n“托运人”:委托运输货物的一方,可能是甲方。\n“收货人”:在一些贸易关系中,收货人可能扮演甲方角色。\n“贸易委托方”:委托进行贸易运输相关事宜的一方可能是甲方。\n\n####对账单/结算单、对账单Excel版\n“甲方”:最直接表明甲方身份的词汇,单据中可能会有 “甲方名称”“甲方账号” 等表述。\n“付款方”:在结算场景下,付款方通常为甲方。\n“委托方”:如果是委托业务结算,委托方大概率是甲方。\n“业务发起方”:说明是该业务的发起主体,一般为甲方。\n\n####电子商业承兑汇票(正面+反面)\n“甲方”:若有明确标注,可直接识别甲方。\n“持票人”:某些情况下,持票人可能为甲方,代表拥有票据权利的一方。\n“票据权利人”:和持票人类似,指向拥有票据相关权益的甲方。\n“交易委托方”:汇票交易中的委托办理一方可能是甲方。\n\n####贸易合同/采购订单\n“甲方”:合同中会明确标识甲方相关信息,如 “甲方:[公司名称]”。\n“买方”:在采购场景中,买方通常是甲方。\n“需方”:需求货物或服务的一方,一般对应甲方。\n“发包方”:如果是工程类或项目类合同,发包方为甲方。\n“委托方”:委托对方执行合同任务的一方为甲方。\n\n####提单\n“甲方”:若提单中有明确标注。\n“托运人”:委托运输货物的一方,可能是甲方。\n“收货人”:在一些贸易关系中,收货人可能扮演甲方角色。\n“贸易委托方”:委托进行贸易运输相关事宜的一方可能是甲方。\n\n####验收单\n“甲方”:单据上可能会清晰列出甲方信息。\n“验收委托方”:委托进行验收工作的一方为甲方。\n“需求方”:对验收对象有需求的一方,通常是甲方。\n\n####出库单\n“甲方”:若有明确注明。\n“提货委托方”:委托进行提货出库操作的一方可能是甲方。\n“需求方”:需要货物出库的一方大概率是甲方。\n\n####中标通知书/中标记录截图\n“甲方”:通知书中会明确标识甲方。\n“招标方”:发起招标活动的一方为甲方。\n“项目发包方”:将项目发包出去的主体是甲方。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“甲方”:单据上可能直接标注甲方信息。\n“收货方”:接收货物的一方可能是甲方。\n“货物需求方”:对货物有需求的主体,一般为甲方。\n“委托收货方”:委托他人进行货物签收等操作的一方可能是甲方。\n\n####上游提货单/出库单/磅单\n“甲方”:若有清晰标注。\n“提货委托方”:委托进行提货操作的一方可能是甲方。\n“货物需求方”:需要货物的一方大概率是甲方。\n“业务委托方”:委托办理提货等业务的一方可能为甲方。\n\n####其他文件\n“甲方”:若文件中有相关标识。\n“主导方”:在业务中起主导作用的一方可能是甲方。\n“委托方”:委托他人开展业务的主体可能是甲方。\n“需求提出方”:提出业务需求的一方可能对应甲方。\n\n以上是在各种单据文本内容中判断出甲方的依据,如果不存在,不要强制推断,直接返回空字符串。\n如果从内容提取不到甲方信息,可以尝试从文件名称中提取甲方信息。\n甲方不是人名,只能是公司主体或者企业。\n', 'title': 'Partya', 'type': 'string'}, 'partyB': {'description': '\n####发票/发票扫描件\n“乙方”:若发票扫描件中有 “乙方名称”“乙方账号””销售方信息“”销售方“ 等表述,可直接识别乙方。\n“收款方”:在结算场景下,收款方通常为乙方。\n内容中如果有2个”统一社会信用代码/纳税人识别号“,第二个包含”统一社会信用代码/纳税人识别号“的一般为乙方。\n\n####货运单据\n“乙方”:若提单中有明确标注。\n“承运人”:负责运输货物的一方,可能是乙方。\n“货运服务提供方”:提供货运服务的主体,大概率是乙方。\n“贸易受托方”:接受贸易运输相关委托事宜的一方可能是乙方。\n\n####对账单/结算单、对账单Excel版\n“乙方”:直接表明乙方身份,单据里可能存在 “乙方名称”“乙方账号” 这类表述。\n“收款方”:结算时负责收款的一方通常为乙方。\n“受托方”:接受委托开展业务的一方,一般是乙方。\n“业务执行方”:具体执行相关业务操作的主体,大概率是乙方。\n\n####电子商业承兑汇票(正面+反面)\n“乙方”:若有明确标��,可直接识别乙方。\n“承兑人”:承担汇票承兑义务的一方,可能是乙方。\n“票据义务承担方”:负责履行票据相关义务的一方,可能对应乙方。\n“交易受托方”:接受汇票交易委托办理的一方可能是乙方。\n\n####贸易合同/采购订单\n“乙方”:合同中会清晰标识乙方相关信息,如 “乙方:[公司名称]”。\n“卖方”:在采购业务里,提供货物或服务的一方通常是乙方。\n“供方”:供应货物或服务的主体,一般为乙方。\n“承包方”:针对工程类或项目类合同,承接项目的一方是乙方。\n“受托方”:接受甲方委托执行合同任务的一方为乙方。\n\n####提单\n“乙方”:若提单中有明确标注。\n“承运人”:负责运输货物的一方,可能是乙方。\n“货运服务提供方”:提供货运服务的主体,大概率是乙方。\n“贸易受托方”:接受贸易运输相关委托事宜的一方可能是乙方。\n\n####验收单\n“乙方”:单据上可能会清楚列出乙方信息。\n“被验收方”:接受验收的一方为乙方。\n“服务或货物提供方”:提供待验收的服务或货物的主体,通常是乙方。\n\n####出库单\n“乙方”:若有明确注明。\n“出库执行方”:实际执行出库操作的一方可能是乙方。\n“货物供应方”:提供出库货物的一方大概率是乙方。\n\n####中标通知书/中标记录截图\n“乙方”:通知书中会明确标识乙方。\n“中标方”:成功中标项目的一方为乙方。\n“项目承包方”:承接项目的主体是乙方。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“乙方”:单据上可能直接标注乙方信息。\n“送货方”:负责运送货物的一方可能是乙方。\n“货物供应方”:提供货物的主体,一般为乙方。\n“受托送货方”:接受委托进行货物运送的一方可能是乙方。\n\n####上游提货单/出库单/磅单\n“乙方”:若有清晰标注。\n“提货执行方”:实际执行提货操作的一方可能是乙方。\n“货物供应方”:提供被提取货物的一方大概率是乙方。\n“业务受托方”:接受提货等业务委托办理的一方可能为乙方。\n\n####其他文件\n“乙方”:若文件中有相关标识。\n“被委托方”:接受他人委托开展业务的一方可能是乙方。\n“服务或货物供应方”:提供服务或货物的主体,可能对应乙方。\n“业务执行配合方”:配合执行相关业务的一方可能是乙方。\n\n以上是在各种单据文本内容中判断出乙方的依据,如果不存在,不要强制推断,直接返回空字符串。\n如果从内容提取不到乙方信息,可以尝试从文件名称中提取乙方信息。\n乙方不是人名,只能是公司主体或者企业。\n', 'title': 'Partyb', 'type': 'string'}, 'projectName': {'description': '\n####发票/发票扫描件\n“项目名称”:直接表明该发票对应的项目名称。\n“货物或应税劳务、服务名称”:若发票是因特定项目产生,此栏会包含与项目相关的关键信息。\n“备注”:部分发票会在备注栏注明项目名称、项目地点等内容。\n\n####货运单据\n“运输货物所属项目”:明确指出所运输货物对应的项目,可能包含项目名称。\n“运输目的地项目”:如果货物运输是为特定项目服务且运往项目地点,这里会提及项目名称。\n“业务范围”:对货运业务所涉及范围的描述中,可能会体现项目名称。\n“业务内容”:货运业务内容的描述中或许包含项目名称相关信息。\n\n####对账单/结算单、对账单Excel版\n“结算项目”:通常会紧跟具体的项目名称,表明本次结算针对的项目。\n“业务项目”:用于说明对账单涉及的业务项目,可能就是项目名称。\n“项目名称”:一些规范的对账单会直接列���项目名称。\n“业务范围”:在描述业务涵盖范围时,可能会体现项目名称。\n\n####电子商业承兑汇票(正面+反面)\n“汇票用途项目”:若汇票是为特定项目开具,这里会体现项目名称。\n“业务项目”:可能会指出该汇票所关联的业务项目,进而判断项目名称。\n\n####贸易合同/采购订单\n“项目名称”:合同中一般会明确列出项目的具体名称。\n“工程名称”:如果是工程类相关合同,会用此表述指代项目名称。\n“合作项目”:引出双方合作的具体项目名称。\n“业务范围”:合同对业务涵盖范围的描述中,可能包含项目名称信息。\n\n####提单\n“货物所属项目”:提单上会注明货物对应的项目,可能就是项目名称。\n“运输业务项目”:用于说明提单所涉及的运输业务对应的项目,可能包含项目名称。\n“业务范围”:提单中对运输业务范围的描述,可能会体现项目名称。\n\n####验收单\n“验收项目”:明确本次验收操作所涉及的项目,会包含项目名称。\n“业务项目”:可能会表明该验收单关联的业务项目,即项目名称。\n“项目名称”:规范的验收单会直接写明项目名称。\n\n####出库单\n“出库项目”:显示本次出库操作对应的项目,可能包含项目名称。\n“业务项目”:可能会指出该出库单关联的业务项目,即项目名称。\n“项目名称”:部分出库单会直接列出项目名称。\n\n####中标通知书/中标记录截图\n“项目名称”:这是中标通知书中必然会明确列出的信息,表明中标项目的具体名称。\n“工程名称”:若为工程类项目招标,会用此表述指代项目名称。\n“招标项目”:会引出具体的招标项目名称。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“项目名称”:单据上可能会直接写明货物签收所对应的项目名称。\n“业务项目”:表明该签收单关联的业务项目,可能就是项目名称。\n“货物所属项目”:明确货物是为哪个项目准备的,可能包含项目名称。\n\n####上游提货单/出库单/磅单\n“提货项目”:显示本次提货操作对应的项目,可能包含项目名称。\n“出库项目”:若为出库单形式,会指出出库所关联的项目,可能是项目名称。\n“业务项目”:可能会指出该单据关联的业务项目,即项目名称。\n“项目名称”:部分单据会直接列出项目名称。\n\n####其他文件\n“主题”:文件的主题若与项目相关,可能包含项目名称。\n“核心业务”:文件描述的核心业务内容中,可能体现项目名称。\n“项目相关内容”:在文件中对项目相关情况的描述里,可能找到项目名称。\n\n以上是在各种单据文本内容中判断出项目名称的依据。\n如果从内容提取不到项目信息,可以尝试从文件名称中提取项目信息。\n', 'title': 'Projectname', 'type': 'string'}, 'documentType': {'description': '单据类型定义:发票/发票扫描件 对应的单据类型是:C01021\n货运单据 对应的单据类型是:C01003\n对账单/结算单 对应的单据类型是:C01004\n对账单Excel版 对应的单据类型是:C01020\n电子商业承兑汇票(正面+反面) 对应的单据类型是:C01013\n其他文件 对应的单据类型是:C01024\n贸易合同/采购订单 对应的单据类型是:C01025\n提单 对应的单据类型是:C01019\n验收单 对应的单据类型是:C01027\n出库单 对应的单据类型是:C01030\n中标通知书/中标记录截图 对应的单据类型是:C01026\n货物签收单-送货单/到货单/过磅单/入库单/验收单 对应的单据类型是:C01002\n上游提货单/出库单/磅单 对应的单据类型是:C01017,\n你是一个文本内容分类器,根据以下类别规则对内容进行分类。请仅输出对应的分类标签,无需解释或额外输出。\n\n#### 发票/发票扫描件\n- 关键特征包括但不限于:"电子发票", "增值税专用发票", "专用发票", "发票号码", "开票日期", "统一社会信用代码/纳税人识别号", "价税合计", "税额", "税率/征收率", "开票人", "收款人", "发票"\n- 输出:C01021\n\n#### 货运单据\n- 关键特征包括但不限于:"收领单", "物资名称", "规格型号", "单位", "订单数量"\n- 输出:C01003\n\n#### 对账单/结算单\n- 关键特征包括但不限于:"对账单", "结算单", "结算", "账期", "结算期", "验收单", "验收单号", "送货单号"\n- 输出:C01004\n\n#### 对账单Excel版\n- 关键特征包括但不限于:"对账单", "材料名称", "金额", "数量", "单位", "日期"\n- 输出:C01020\n\n#### 电子商业承兑汇票(正面+反面)\n- 关键特征包括但不限于:"电子商业承兑汇票", "票据号码", "出票日期", "票据状态", "出票到期日", "承兑保证信息", "承兑信息", "出票人", "承兑人"\n- 输出:C01013\n\n#### 贸易合同/采购订单\n- 关键特征包括但不限于:"合同", "合同签订地点", "合同签订时间", "合同编号", "甲方", "乙方", "购买方", "销售方", "采购订单", "采购", "订单"\n- 输出:C01025\n\n#### 提单\n- 关键特征包括但不限于:"提单", "提单号", "船名", "航次", "提单日期", "收货人"\n- 输出:C01019\n\n#### 验收单\n- 关键特征包括但不限于:"验收单", "验收单号", "验收日期", "物资名称", "规格型号", "单位", "数量"\n- 输出:C01027\n\n#### 出库单\n- 关键特征包括但不限于:"出库单", "出库日期", "出库编号", "货品名称", "数量", "接收人"\n- 输出:C01030\n\n#### 中标通知书/中标记录截图\n- 关键特征包括但不限于:"中标通知书", "中标单位", "中标价格", "中标日期", "中标编号"\n- 输出:C01026\n\n#### 货物签收单-送货单/到货单/过磅单/入库单/验收单\n- 关���特征包括但不限于:"货物签收单", "送货单", "到货单", "过磅单", "入库单", "验收单", "签收单", "送货单号", "收货人", "收货地址", "收货日期", "收货数量", "收货单位", "收货人", "收货人地址", "收货人电话", "收货人手机", "收货人邮箱", "收货人联系人", "收货人联系人电话", "收货人联系人手机", "收货人联系人邮箱", "收货人联系人地址", "收货人联系人姓名", "收货"\n- 输出:C01002\n\n#### 上游提货单/出库单/磅单\n- 关键特征包括但不限于:"上游提货单", "出库单", "磅单", "提货单", "出库单", "磅单", "提货单号", "提货日期", "提货数量", "提货单位", "提货人", "提货人地址", "提货人电话", "提货人手机", "提货人邮箱", "提货人联系人", "提货人联系人电话", "提货人联系人手机", "提货人联系人邮箱", "提货人联系人地址", "提货人联系人姓名", "提货人联系人电话", "提货人联系人手机", "提货人联系人邮箱", "提货人联系人地址", "提货人联系人姓名", "提货人联系人电话", "提货人联系人手机"\n- 输出:C01017\n\n#### 其他文件和未识别类型\n- 输出:C01024\n\n输入文档后,请输出对应的分类编号,分类结果可能包含多个分类编号。\n如果从内容提取不到单据类型信息,可以尝试从文件名称中提取单据类型信息。', 'title': 'Documenttype', 'type': 'string'}, 'isStandards': {'description': '该属性不要赋值,只当做占位符,用于判断是否为标准单据。', 'title': 'Isstandards', 'type': 'boolean'}}, 'required': ['partyA', 'partyB', 'projectName', 'documentType', 'isStandards'], 'title': 'FileType6', 'type': 'object', 'additionalProperties': False}, regex=None, choice=None, grammar=None, json_object=None, backend='xgrammar', backend_was_auto=True, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, whitespace_pattern=None, structural_tag=None), extra_args=None),block_ids=([12350, 1880, 2373, 83, 1067, 11592, 10567, 7161, 7401, 7470, 7469, 7468, 7264, 7256, 7248, 7147, 7294, 7467, 7466, 7465, 7464, 7463, 7462, 7461, 7460, 7459, 7458, 7457, 7456, 7455, 7454, 7453, 7452, 7451, 7450, 7449, 7573, 7517, 7484, 9010, 10252, 7259, 7251, 7150, 7296, 7407, 7349, 7348, 7347, 7692, 7683, 7650, 7546, 7514, 7481, 7448, 7391, 7346, 7338, 7335, 7334, 7333, 7332, 7331, 7330, 7329, 7328, 7327, 9067, 7262, 7254, 7152, 7144, 7291, 7418, 7376, 7370, 7369, 7368, 7367, 7366, 7365, 7364, 7363, 7362, 7361, 7360, 7359, 7358, 7357, 7356, 7355, 7354, 7353, 7352, 7351, 9112, 7263, 7255, 7153, 7146, 7293, 7427, 7426, 7425, 7424, 7423, 7422, 7421, 7420, 7574, 7542, 7485, 7476, 7419, 7417, 7416, 7415, 7414, 7413, 7412, 7411, 11258, 15151, 14928, 137, 14906, 738, 729, 453, 7816, 7815, 7970, 7938, 7930, 7898, 7841, 7814, 7813, 7812, 7811, 7810, 7809, 7808, 11576, 11568, 11532, 11531, 12051, 961, 1497, 8394, 11865, 11741, 11805, 8450, 11734, 12467, 3033, 3543, 1585, 14508, 14477, 15179, 1999, 5369],),num_computed_tokens=5280,lora_request=None)], scheduled_cached_reqs=[CachedRequestData(req_id='chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa', resumed_from_preemption=false, new_token_ids=[4383], new_block_ids=[[]], num_computed_tokens=836), CachedRequestData(req_id='chatcmpl-63959a3c24594e6eb3571e4ae9f7285c', resumed_from_preemption=false, new_token_ids=[515], new_block_ids=[[]], num_computed_tokens=5294)], num_scheduled_tokens={chatcmpl-63959a3c24594e6eb3571e4ae9f7285c: 1, chatcmpl-65e9066118be463f982c07f0b89379b8: 24, chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa: 1}, total_num_scheduled_tokens=26, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={chatcmpl-63959a3c24594e6eb3571e4ae9f7285c: 1, chatcmpl-65e9066118be463f982c07f0b89379b8: 2}, grammar_bitmask=array([[ 2, 0, 0, ..., 0, 0, 0],
ERROR 06-26 23:10:17 [dump_input.py:80] [ 0, 0, 67108864, ..., 0, 0, 0]],
ERROR 06-26 23:10:17 [dump_input.py:80] shape=(2, 4752), dtype=int32), kv_connector_metadata=null)
ERROR 06-26 23:10:17 [dump_input.py:82] SchedulerStats(num_running_reqs=3, num_waiting_reqs=0, gpu_cache_usage=0.014045861370614698, prefix_cache_stats=PrefixCacheStats(reset=False, requests=1, queries=5304, hits=5280), spec_decoding_stats=None)
ERROR 06-26 23:10:17 [core.py:517] EngineCore encountered a fatal error.
ERROR 06-26 23:10:17 [core.py:517] Traceback (most recent call last):
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 508, in run_engine_core
ERROR 06-26 23:10:17 [core.py:517] engine_core.run_busy_loop()
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 535, in run_busy_loop
ERROR 06-26 23:10:17 [core.py:517] self._process_engine_step()
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 560, in _process_engine_step
ERROR 06-26 23:10:17 [core.py:517] outputs, model_executed = self.step_fn()
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 231, in step
ERROR 06-26 23:10:17 [core.py:517] model_output = self.execute_model(scheduler_output)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 217, in execute_model
ERROR 06-26 23:10:17 [core.py:517] raise err
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 211, in execute_model
ERROR 06-26 23:10:17 [core.py:517] return self.model_executor.execute_model(scheduler_output)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 163, in execute_model
ERROR 06-26 23:10:17 [core.py:517] (output, ) = self.collective_rpc("execute_model",
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 220, in collective_rpc
ERROR 06-26 23:10:17 [core.py:517] result = get_response(w, dequeue_timeout)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 207, in get_response
ERROR 06-26 23:10:17 [core.py:517] raise RuntimeError(
ERROR 06-26 23:10:17 [core.py:517] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
ERROR 06-26 23:10:17 [core.py:517] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 06-26 23:10:17 [core.py:517] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 06-26 23:10:17 [core.py:517] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
ERROR 06-26 23:10:17 [core.py:517] ', please check the stack trace above for the root cause
ERROR 06-26 23:10:17 [async_llm.py:420] AsyncLLM output_handler failed.
ERROR 06-26 23:10:17 [async_llm.py:420] Traceback (most recent call last):
ERROR 06-26 23:10:17 [async_llm.py:420] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 379, in output_handler
ERROR 06-26 23:10:17 [async_llm.py:420] outputs = await engine_core.get_output_async()
ERROR 06-26 23:10:17 [async_llm.py:420] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [async_llm.py:420] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 790, in get_output_async
ERROR 06-26 23:10:17 [async_llm.py:420] raise self._format_exception(outputs) from None
ERROR 06-26 23:10:17 [async_llm.py:420] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa failed (engine dead).
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-63959a3c24594e6eb3571e4ae9f7285c failed (engine dead).
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-65e9066118be463f982c07f0b89379b8 failed (engine dead).
INFO: 172.18.0.1:38700 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
INFO: 172.18.0.1:38710 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
INFO: 172.18.0.1:38714 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
nanobind: leaked 4 instances!
- leaked instance 0x7f8cfcae3678 of type "xgrammar.xgrammar_bindings.GrammarMatcher"
- leaked instance 0x7f8cfcaebd98 of type "xgrammar.xgrammar_bindings.GrammarMatcher"
- leaked instance 0x7f8ccf8ec918 of type "xgrammar.xgrammar_bindings.CompiledGrammar"
- leaked instance 0x7f8ce5bf28c8 of type "xgrammar.xgrammar_bindings.CompiledGrammar"
nanobind: leaked 2 types! - leaked type "xgrammar.xgrammar_bindings.GrammarMatcher"
- leaked type "xgrammar.xgrammar_bindings.CompiledGrammar"
nanobind: leaked 13 functions! - leaked function "init"
- leaked function "find_jump_forward_string"
- leaked function ""
- leaked function "reset"
- leaked function ""
- leaked function "fill_next_token_bitmask"
- leaked function "is_terminated"
- leaked function "accept_token"
- leaked function "rollback"
- leaked function ""
- leaked function "_debug_accept_string"
- leaked function ""
- leaked function ""
nanobind: this is likely caused by a reference counting issue in the binding code.
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.