Skip to content

[Bug]:RuntimeError: CUDA error: an illegal memory access was encountered #20170

Open
@xiaocode337317439

Description

@xiaocode337317439

Your current environment

vllm/vllm-openai:v0.9.1

Image
The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=3 pid=226) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=2 pid=225) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
[rank2]:[E626 23:10:17.921506443 ProcessGroupNCCL.cpp:1896] [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe0 (0x7f482bab34a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f489cba5422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f482c88b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f482c89b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f482c89d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f482c89ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] WorkerProc hit an exception.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Traceback (most recent call last):
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 522, in worker_busy_loop
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 293, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] return func(*args, **kwargs)
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1374, in execute_model
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] RuntimeError: CUDA error: an illegal memory access was encountered
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
(VllmWorker rank=0 pid=223) ERROR 06-26 23:10:17 [multiproc_executor.py:527]
ERROR 06-26 23:10:17 [dump_input.py:69] Dumping input data
what(): [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe0 (0x7f482bab34a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f489cba5422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f482c88b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f482c89b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f482c89d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f482c89ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1902 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x98 (0x7f482bb1e5e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: + 0xcc7a4e (0x7f482c86da4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x9165ed (0x7f482c4bc5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xdc253 (0x7f481cbb3253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #4: + 0x94ac3 (0x7f489d793ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #5: clone + 0x44 (0x7f489d824a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)

ERROR 06-26 23:10:17 [dump_input.py:71] V1 LLM engine (v0.9.1) with config: model='/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-72B-Instruct', speculative_config=None, tokenizer='/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-72B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen2.5-72B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null},
ERROR 06-26 23:10:17 [dump_input.py:79] Dumping scheduler output for model execution:
ERROR 06-26 23:10:17 [dump_input.py:80] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-65e9066118be463f982c07f0b89379b8,prompt_token_ids_len=5304,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=125768, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'description': '文件按照客户维度分类信息', 'properties': {'partyA': {'description': '\n####发票/发票扫描件\n“甲方”:若发票扫描件中有 “甲方名称”“甲方账号””购买方信息“”购买方“ 等表述,可直接识别甲方。\n“付款方”:在结算场景下,付款方通常为甲方。\n内容中如果有2个”统一社会信用代码/纳税人识别号“,第一个包含”统一社会信用代码/纳税人识别号“的一般为甲方。\n\n####货运单据\n“甲方”:若提单中有明确标注。\n“托运人”:委托运输货物的一方,可能是甲方。\n“收货人”:在一些贸易关系中,收货人可能扮演甲方角色。\n“贸易委托方”:委托进行贸易运输相关事宜的一方可能是甲方。\n\n####对账单/结算单、对账单Excel版\n“甲方”:最直接表明甲方身份的词汇,单据中可能会有 “甲方名称”“甲方账号” 等表述。\n“付款方”:在结算场景下,付款方通常为甲方。\n“委托方”:如果是委托业务结算,委托方大概率是甲方。\n“业务发起方”:说明是该业务的发起主体,一般为甲方。\n\n####电子商业承兑汇票(正面+反面)\n“甲方”:若有明确标注,可直接识别甲方。\n“持票人”:某些情况下,持票人可能为甲方,代表拥有票据权利的一方。\n“票据权利人”:和持票人类似,指向拥有票据相关权益的甲方。\n“交易委托方”:汇票交易中的委托办理一方可能是甲方。\n\n####贸易合同/采购订单\n“甲方”:合同中会明确标识甲方相关信息,如 “甲方:[公司名称]”。\n“买方”:在采购场景中,买方通常是甲方。\n“需方”:需求货物或服务的一方,一般对应甲方。\n“发包方”:如果是工程类或项目类合同,发包方为甲方。\n“委托方”:委托对方执行合同任务的一方为甲方。\n\n####提单\n“甲方”:若提单中有明确标注。\n“托运人”:委托运输货物的一方,可能是甲方。\n“收货人”:在一些贸易关系中,收货人可能扮演甲方角色。\n“贸易委托方”:委托进行贸易运输相关事宜的一方可能是甲方。\n\n####验收单\n“甲方”:单据上可能会清晰列出甲方信息。\n“验收委托方”:委托进行验收工作的一方为甲方。\n“需求方”:对验收对象有需求的一方,通常是甲方。\n\n####出库单\n“甲方”:若有明确注明。\n“提货委托方”:委托进行提货出库操作的一方可能是甲方。\n“需求方”:需要货物出库的一方大概率是甲方。\n\n####中标通知书/中标记录截图\n“甲方”:通知书中会明确标识甲方。\n“招标方”:发起招标活动的一方为甲方。\n“项目发包方”:将项目发包出去的主体是甲方。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“甲方”:单据上可能直接标注甲方信息。\n“收货方”:接收货物的一方可能是甲方。\n“货物需求方”:对货物有需求的主体,一般为甲方。\n“委托收货方”:委托他人进行货物签收等操作的一方可能是甲方。\n\n####上游提货单/出库单/磅单\n“甲方”:若有清晰标注。\n“提货委托方”:委托进行提货操作的一方可能是甲方。\n“货物需求方”:需要货物的一方大概率是甲方。\n“业务委托方”:委托办理提货等业务的一方可能为甲方。\n\n####其他文件\n“甲方”:若文件中有相关标识。\n“主导方”:在业务中起主导作用的一方可能是甲方。\n“委托方”:委托他人开展业务的主体可能是甲方。\n“需求提出方”:提出业务需求的一方可能对应甲方。\n\n以上是在各种单据文本内容中判断出甲方的依据,如果不存在,不要强制推断,直接返回空字符串。\n如果从内容提取不到甲方信息,可以尝试从文件名称中提取甲方信息。\n甲方不是人名,只能是公司主体或者企业。\n', 'title': 'Partya', 'type': 'string'}, 'partyB': {'description': '\n####发票/发票扫描件\n“乙方”:若发票扫描件中有 “乙方名称”“乙方账号””销售方信息“”销售方“ 等表述,可直接识别乙方。\n“收款方”:在结算场景下,收款方通常为乙方。\n内容中如果有2个”统一社会信用代码/纳税人识别号“,第二个包含”统一社会信用代码/纳税人识别号“的一般为乙方。\n\n####货运单据\n“乙方”:若提单中有明确标注。\n“承运人”:负责运输货物的一方,可能是乙方。\n“货运服务提供方”:提供货运服务的主体,大概率是乙方。\n“贸易受托方”:接受贸易运输相关委托事宜的一方可能是乙方。\n\n####对账单/结算单、对账单Excel版\n“乙方”:直接表明乙方身份,单据里可能存在 “乙方名称”“乙方账号” 这类表述。\n“收款方”:结算时负责收款的一方通常为乙方。\n“受托方”:接受委托开展业务的一方,一般是乙方。\n“业务执行方”:具体执行相关业务操作的主体,大概率是乙方。\n\n####电子商业承兑汇票(正面+反面)\n“乙方”:若有明确标��,可直接识别乙方。\n“承兑人”:承担汇票承兑义务的一方,可能是乙方。\n“票据义务承担方”:负责履行票据相关义务的一方,可能对应乙方。\n“交易受托方”:接受汇票交易委托办理的一方可能是乙方。\n\n####贸易合同/采购订单\n“乙方”:合同中会清晰标识乙方相关信息,如 “乙方:[公司名称]”。\n“卖方”:在采购业务里,提供货物或服务的一方通常是乙方。\n“供方”:供应货物或服务的主体,一般为乙方。\n“承包方”:针对工程类或项目类合同,承接项目的一方是乙方。\n“受托方”:接受甲方委托执行合同任务的一方为乙方。\n\n####提单\n“乙方”:若提单中有明确标注。\n“承运人”:负责运输货物的一方,可能是乙方。\n“货运服务提供方”:提供货运服务的主体,大概率是乙方。\n“贸易受托方”:接受贸易运输相关委托事宜的一方可能是乙方。\n\n####验收单\n“乙方”:单据上可能会清楚列出乙方信息。\n“被验收方”:接受验收的一方为乙方。\n“服务或货物提供方”:提供待验收的服务或货物的主体,通常是乙方。\n\n####出库单\n“乙方”:若有明确注明。\n“出库执行方”:实际执行出库操作的一方可能是乙方。\n“货物供应方”:提供出库货物的一方大概率是乙方。\n\n####中标通知书/中标记录截图\n“乙方”:通知书中会明确标识乙方。\n“中标方”:成功中标项目的一方为乙方。\n“项目承包方”:承接项目的主体是乙方。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“乙方”:单据上可能直接标注乙方信息。\n“送货方”:负责运送货物的一方可能是乙方。\n“货物供应方”:提供货物的主体,一般为乙方。\n“受托送货方”:接受委托进行货物运送的一方可能是乙方。\n\n####上游提货单/出库单/磅单\n“乙方”:若有清晰标注。\n“提货执行方”:实际执行提货操作的一方可能是乙方。\n“货物供应方”:提供被提取货物的一方大概率是乙方。\n“业务受托方”:接受提货等业务委托办理的一方可能为乙方。\n\n####其他文件\n“乙方”:若文件中有相关标识。\n“被委托方”:接受他人委托开展业务的一方可能是乙方。\n“服务或货物供应方”:提供服务或货物的主体,可能对应乙方。\n“业务执行配合方”:配合执行相关业务的一方可能是乙方。\n\n以上是在各种单据文本内容中判断出乙方的依据,如果不存在,不要强制推断,直接返回空字符串。\n如果从内容提取不到乙方信息,可以尝试从文件名称中提取乙方信息。\n乙方不是人名,只能是公司主体或者企业。\n', 'title': 'Partyb', 'type': 'string'}, 'projectName': {'description': '\n####发票/发票扫描件\n“项目名称”:直接表明该发票对应的项目名称。\n“货物或应税劳务、服务名称”:若发票是因特定项目产生,此栏会包含与项目相关的关键信息。\n“备注”:部分发票会在备注栏注明项目名称、项目地点等内容。\n\n####货运单据\n“运输货物所属项目”:明确指出所运输货物对应的项目,可能包含项目名称。\n“运输目的地项目”:如果货物运输是为特定项目服务且运往项目地点,这里会提及项目名称。\n“业务范围”:对货运业务所涉及范围的描述中,可能会体现项目名称。\n“业务内容”:货运业务内容的描述中或许包含项目名称相关信息。\n\n####对账单/结算单、对账单Excel版\n“结算项目”:通常会紧跟具体的项目名称,表明本次结算针对的项目。\n“业务项目”:用于说明对账单涉及的业务项目,可能就是项目名称。\n“项目名称”:一些规范的对账单会直接列���项目名称。\n“业务范围”:在描述业务涵盖范围时,可能会体现项目名称。\n\n####电子商业承兑汇票(正面+反面)\n“汇票用途项目”:若汇票是为特定项目开具,这里会体现项目名称。\n“业务项目”:可能会指出该汇票所关联的业务项目,进而判断项目名称。\n\n####贸易合同/采购订单\n“项目名称”:合同中一般会明确列出项目的具体名称。\n“工程名称”:如果是工程类相关合同,会用此表述指代项目名称。\n“合作项目”:引出双方合作的具体项目名称。\n“业务范围”:合同对业务涵盖范围的描述中,可能包含项目名称信息。\n\n####提单\n“货物所属项目”:提单上会注明货物对应的项目,可能就是项目名称。\n“运输业务项目”:用于说明提单所涉及的运输业务对应的项目,可能包含项目名称。\n“业务范围”:提单中对运输业务范围的描述,可能会体现项目名称。\n\n####验收单\n“验收项目”:明确本次验收操作所涉及的项目,会包含项目名称。\n“业务项目”:可能会表明该验收单关联的业务项目,即项目名称。\n“项目名称”:规范的验收单会直接写明项目名称。\n\n####出库单\n“出库项目”:显示本次出库操作对应的项目,可能包含项目名称。\n“业务项目”:可能会指出该出库单关联的业务项目,即项目名称。\n“项目名称”:部分出库单会直接列出项目名称。\n\n####中标通知书/中标记录截图\n“项目名称”:这是中标通知书中必然会明确列出的信息,表明中标项目的具体名称。\n“工程名称”:若为工程类项目招标,会用此表述指代项目名称。\n“招标项目”:会引出具体的招标项目名称。\n\n####货物签收单-送货单/到货单/过磅单/入库单/验收单\n“项目名称”:单据上可能会直接写明货物签收所对应的项目名称。\n“业务项目”:表明该签收单关联的业务项目,可能就是项目名称。\n“货物所属项目”:明确货物是为哪个项目准备的,可能包含项目名称。\n\n####上游提货单/出库单/磅单\n“提货项目”:显示本次提货操作对应的项目,可能包含项目名称。\n“出库项目”:若为出库单形式,会指出出库所关联的项目,可能是项目名称。\n“业务项目”:可能会指出该单据关联的业务项目,即项目名称。\n“项目名称”:部分单据会直接列出项目名称。\n\n####其他文件\n“主题”:文件的主题若与项目相关,可能包含项目名称。\n“核心业务”:文件描述的核心业务内容中,可能体现项目名称。\n“项目相关内容”:在文件中对项目相关情况的描述里,可能找到项目名称。\n\n以上是在各种单据文本内容中判断出项目名称的依据。\n如果从内容提取不到项目信息,可以尝试从文件名称中提取项目信息。\n', 'title': 'Projectname', 'type': 'string'}, 'documentType': {'description': '单据类型定义:发票/发票扫描件 对应的单据类型是:C01021\n货运单据 对应的单据类型是:C01003\n对账单/结算单 对应的单据类型是:C01004\n对账单Excel版 对应的单据类型是:C01020\n电子商业承兑汇票(正面+反面) 对应的单据类型是:C01013\n其他文件 对应的单据类型是:C01024\n贸易合同/采购订单 对应的单据类型是:C01025\n提单 对应的单据类型是:C01019\n验收单 对应的单据类型是:C01027\n出库单 对应的单据类型是:C01030\n中标通知书/中标记录截图 对应的单据类型是:C01026\n货物签收单-送货单/到货单/过磅单/入库单/验收单 对应的单据类型是:C01002\n上游提货单/出库单/磅单 对应的单据类型是:C01017,\n你是一个文本内容分类器,根据以下类别规则对内容进行分类。请仅输出对应的分类标签,无需解释或额外输出。\n\n#### 发票/发票扫描件\n- 关键特征包括但不限于:"电子发票", "增值税专用发票", "专用发票", "发票号码", "开票日期", "统一社会信用代码/纳税人识别号", "价税合计", "税额", "税率/征收率", "开票人", "收款人", "发票"\n- 输出:C01021\n\n#### 货运单据\n- 关键特征包括但不限于:"收领单", "物资名称", "规格型号", "单位", "订单数量"\n- 输出:C01003\n\n#### 对账单/结算单\n- 关键特征包括但不限于:"对账单", "结算单", "结算", "账期", "结算期", "验收单", "验收单号", "送货单号"\n- 输出:C01004\n\n#### 对账单Excel版\n- 关键特征包括但不限于:"对账单", "材料名称", "金额", "数量", "单位", "日期"\n- 输出:C01020\n\n#### 电子商业承兑汇票(正面+反面)\n- 关键特征包括但不限于:"电子商业承兑汇票", "票据号码", "出票日期", "票据状态", "出票到期日", "承兑保证信息", "承兑信息", "出票人", "承兑人"\n- 输出:C01013\n\n#### 贸易合同/采购订单\n- 关键特征包括但不限于:"合同", "合同签订地点", "合同签订时间", "合同编号", "甲方", "乙方", "购买方", "销售方", "采购订单", "采购", "订单"\n- 输出:C01025\n\n#### 提单\n- 关键特征包括但不限于:"提单", "提单号", "船名", "航次", "提单日期", "收货人"\n- 输出:C01019\n\n#### 验收单\n- 关键特征包括但不限于:"验收单", "验收单号", "验收日期", "物资名称", "规格型号", "单位", "数量"\n- 输出:C01027\n\n#### 出库单\n- 关键特征包括但不限于:"出库单", "出库日期", "出库编号", "货品名称", "数量", "接收人"\n- 输出:C01030\n\n#### 中标通知书/中标记录截图\n- 关键特征包括但不限于:"中标通知书", "中标单位", "中标价格", "中标日期", "中标编号"\n- 输出:C01026\n\n#### 货物签收单-送货单/到货单/过磅单/入库单/验收单\n- 关���特征包括但不限于:"货物签收单", "送货单", "到货单", "过磅单", "入库单", "验收单", "签收单", "送货单号", "收货人", "收货地址", "收货日期", "收货数量", "收货单位", "收货人", "收货人地址", "收货人电话", "收货人手机", "收货人邮箱", "收货人联系人", "收货人联系人电话", "收货人联系人手机", "收货人联系人邮箱", "收货人联系人地址", "收货人联系人姓名", "收货"\n- 输出:C01002\n\n#### 上游提货单/出库单/磅单\n- 关键特征包括但不限于:"上游提货单", "出库单", "磅单", "提货单", "出库单", "磅单", "提货单号", "提货日期", "提货数量", "提货单位", "提货人", "提货人地址", "提货人电话", "提货人手机", "提货人邮箱", "提货人联系人", "提货人联系人电话", "提货人联系人手机", "提货人联系人邮箱", "提货人联系人地址", "提货人联系人姓名", "提货人联系人电话", "提货人联系人手机", "提货人联系人邮箱", "提货人联系人地址", "提货人联系人姓名", "提货人联系人电话", "提货人联系人手机"\n- 输出:C01017\n\n#### 其他文件和未识别类型\n- 输出:C01024\n\n输入文档后,请输出对应的分类编号,分类结果可能包含多个分类编号。\n如果从内容提取不到单据类型信息,可以尝试从文件名称中提取单据类型信息。', 'title': 'Documenttype', 'type': 'string'}, 'isStandards': {'description': '该属性不要赋值,只当做占位符,用于判断是否为标准单据。', 'title': 'Isstandards', 'type': 'boolean'}}, 'required': ['partyA', 'partyB', 'projectName', 'documentType', 'isStandards'], 'title': 'FileType6', 'type': 'object', 'additionalProperties': False}, regex=None, choice=None, grammar=None, json_object=None, backend='xgrammar', backend_was_auto=True, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, whitespace_pattern=None, structural_tag=None), extra_args=None),block_ids=([12350, 1880, 2373, 83, 1067, 11592, 10567, 7161, 7401, 7470, 7469, 7468, 7264, 7256, 7248, 7147, 7294, 7467, 7466, 7465, 7464, 7463, 7462, 7461, 7460, 7459, 7458, 7457, 7456, 7455, 7454, 7453, 7452, 7451, 7450, 7449, 7573, 7517, 7484, 9010, 10252, 7259, 7251, 7150, 7296, 7407, 7349, 7348, 7347, 7692, 7683, 7650, 7546, 7514, 7481, 7448, 7391, 7346, 7338, 7335, 7334, 7333, 7332, 7331, 7330, 7329, 7328, 7327, 9067, 7262, 7254, 7152, 7144, 7291, 7418, 7376, 7370, 7369, 7368, 7367, 7366, 7365, 7364, 7363, 7362, 7361, 7360, 7359, 7358, 7357, 7356, 7355, 7354, 7353, 7352, 7351, 9112, 7263, 7255, 7153, 7146, 7293, 7427, 7426, 7425, 7424, 7423, 7422, 7421, 7420, 7574, 7542, 7485, 7476, 7419, 7417, 7416, 7415, 7414, 7413, 7412, 7411, 11258, 15151, 14928, 137, 14906, 738, 729, 453, 7816, 7815, 7970, 7938, 7930, 7898, 7841, 7814, 7813, 7812, 7811, 7810, 7809, 7808, 11576, 11568, 11532, 11531, 12051, 961, 1497, 8394, 11865, 11741, 11805, 8450, 11734, 12467, 3033, 3543, 1585, 14508, 14477, 15179, 1999, 5369],),num_computed_tokens=5280,lora_request=None)], scheduled_cached_reqs=[CachedRequestData(req_id='chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa', resumed_from_preemption=false, new_token_ids=[4383], new_block_ids=[[]], num_computed_tokens=836), CachedRequestData(req_id='chatcmpl-63959a3c24594e6eb3571e4ae9f7285c', resumed_from_preemption=false, new_token_ids=[515], new_block_ids=[[]], num_computed_tokens=5294)], num_scheduled_tokens={chatcmpl-63959a3c24594e6eb3571e4ae9f7285c: 1, chatcmpl-65e9066118be463f982c07f0b89379b8: 24, chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa: 1}, total_num_scheduled_tokens=26, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={chatcmpl-63959a3c24594e6eb3571e4ae9f7285c: 1, chatcmpl-65e9066118be463f982c07f0b89379b8: 2}, grammar_bitmask=array([[ 2, 0, 0, ..., 0, 0, 0],
ERROR 06-26 23:10:17 [dump_input.py:80] [ 0, 0, 67108864, ..., 0, 0, 0]],
ERROR 06-26 23:10:17 [dump_input.py:80] shape=(2, 4752), dtype=int32), kv_connector_metadata=null)
ERROR 06-26 23:10:17 [dump_input.py:82] SchedulerStats(num_running_reqs=3, num_waiting_reqs=0, gpu_cache_usage=0.014045861370614698, prefix_cache_stats=PrefixCacheStats(reset=False, requests=1, queries=5304, hits=5280), spec_decoding_stats=None)
ERROR 06-26 23:10:17 [core.py:517] EngineCore encountered a fatal error.
ERROR 06-26 23:10:17 [core.py:517] Traceback (most recent call last):
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 508, in run_engine_core
ERROR 06-26 23:10:17 [core.py:517] engine_core.run_busy_loop()
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 535, in run_busy_loop
ERROR 06-26 23:10:17 [core.py:517] self._process_engine_step()
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 560, in _process_engine_step
ERROR 06-26 23:10:17 [core.py:517] outputs, model_executed = self.step_fn()
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 231, in step
ERROR 06-26 23:10:17 [core.py:517] model_output = self.execute_model(scheduler_output)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 217, in execute_model
ERROR 06-26 23:10:17 [core.py:517] raise err
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 211, in execute_model
ERROR 06-26 23:10:17 [core.py:517] return self.model_executor.execute_model(scheduler_output)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 163, in execute_model
ERROR 06-26 23:10:17 [core.py:517] (output, ) = self.collective_rpc("execute_model",
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 220, in collective_rpc
ERROR 06-26 23:10:17 [core.py:517] result = get_response(w, dequeue_timeout)
ERROR 06-26 23:10:17 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [core.py:517] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 207, in get_response
ERROR 06-26 23:10:17 [core.py:517] raise RuntimeError(
ERROR 06-26 23:10:17 [core.py:517] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered
ERROR 06-26 23:10:17 [core.py:517] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 06-26 23:10:17 [core.py:517] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 06-26 23:10:17 [core.py:517] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
ERROR 06-26 23:10:17 [core.py:517] ', please check the stack trace above for the root cause
ERROR 06-26 23:10:17 [async_llm.py:420] AsyncLLM output_handler failed.
ERROR 06-26 23:10:17 [async_llm.py:420] Traceback (most recent call last):
ERROR 06-26 23:10:17 [async_llm.py:420] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 379, in output_handler
ERROR 06-26 23:10:17 [async_llm.py:420] outputs = await engine_core.get_output_async()
ERROR 06-26 23:10:17 [async_llm.py:420] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 23:10:17 [async_llm.py:420] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 790, in get_output_async
ERROR 06-26 23:10:17 [async_llm.py:420] raise self._format_exception(outputs) from None
ERROR 06-26 23:10:17 [async_llm.py:420] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-f745bb35fc2b437c92035bf14a2a8cfa failed (engine dead).
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-63959a3c24594e6eb3571e4ae9f7285c failed (engine dead).
INFO 06-26 23:10:17 [async_llm.py:346] Request chatcmpl-65e9066118be463f982c07f0b89379b8 failed (engine dead).
INFO: 172.18.0.1:38700 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
INFO: 172.18.0.1:38710 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
INFO: 172.18.0.1:38714 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:594 'an illegal memory access was encountered'
nanobind: leaked 4 instances!

  • leaked instance 0x7f8cfcae3678 of type "xgrammar.xgrammar_bindings.GrammarMatcher"
  • leaked instance 0x7f8cfcaebd98 of type "xgrammar.xgrammar_bindings.GrammarMatcher"
  • leaked instance 0x7f8ccf8ec918 of type "xgrammar.xgrammar_bindings.CompiledGrammar"
  • leaked instance 0x7f8ce5bf28c8 of type "xgrammar.xgrammar_bindings.CompiledGrammar"
    nanobind: leaked 2 types!
  • leaked type "xgrammar.xgrammar_bindings.GrammarMatcher"
  • leaked type "xgrammar.xgrammar_bindings.CompiledGrammar"
    nanobind: leaked 13 functions!
  • leaked function "init"
  • leaked function "find_jump_forward_string"
  • leaked function ""
  • leaked function "reset"
  • leaked function ""
  • leaked function "fill_next_token_bitmask"
  • leaked function "is_terminated"
  • leaked function "accept_token"
  • leaked function "rollback"
  • leaked function ""
  • leaked function "_debug_accept_string"
  • leaked function ""
  • leaked function ""
    nanobind: this is likely caused by a reference counting issue in the binding code.
    /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
    warnings.warn('resource_tracker: There appear to be %d '
    /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
    warnings.warn('resource_tracker: There appear to be %d '

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions