[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31

Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB

Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      43 bits physical, 48 bits virtual
CPU(s):                             256
On-line CPU(s) list:                0-255
Thread(s) per core:                 2
Core(s) per socket:                 64
Socket(s):                          2
NUMA node(s):                       8
Vendor ID:                          AuthenticAMD
CPU family:                         23
Model:                              49
Model name:                         AMD EPYC 7742 64-Core Processor
Stepping:                           0
Frequency boost:                    enabled
CPU MHz:                            3389.731
CPU max MHz:                        2250.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4491.29
Virtualization:                     AMD-V
L1d cache:                          4 MiB
L1i cache:                          4 MiB
L2 cache:                           64 MiB
L3 cache:                           512 MiB
NUMA node0 CPU(s):                  0-15,128-143
NUMA node1 CPU(s):                  16-31,144-159
NUMA node2 CPU(s):                  32-47,160-175
NUMA node3 CPU(s):                  48-63,176-191
NUMA node4 CPU(s):                  64-79,192-207
NUMA node5 CPU(s):                  80-95,208-223
NUMA node6 CPU(s):                  96-111,224-239
NUMA node7 CPU(s):                  112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Vulnerable
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca sme sev sev_es

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.8.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.555.43
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.20
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==25.1.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.0
[pip3] triton==3.0.0
[conda] _anaconda_depends         2024.02             py311_mkl_1  
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py311h5eee18b_1  
[conda] mkl_fft                   1.3.8           py311h5eee18b_0  
[conda] mkl_random                1.2.4           py311hdb19cb5_0  
[conda] numpy                     1.26.4          py311h08b1b3b_0  
[conda] numpy-base                1.26.4          py311hf175353_0  
[conda] numpydoc                  1.5.0           py311h06a4308_0  
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-ml-py              12.555.43                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.20                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] pyzmq                     25.1.2          py311h6a678d5_0  
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] transformers              4.44.0                   pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.4@4db5176d9758b720b05460c50ace3c01026eb158

```

</details>


### Model Input Dumps

INFO 10-12 11:34:09 logger.py:36] Received request chat-44505254559d4a72ad36a008ebbfbbdf: prompt: '<|im_start|>system\n你是一个专业且精确的语言判断和翻译工具，你的任务是判断用户输入的字符串是什么语言，并将它翻译为英语，仅需要输出翻译后的结果，不需要描述你的思路或补充性说明等。保持简洁的描述。\n\n**输入类型**：字符串，可能是任何语言，也可能是几种语言的混合，也有可能为空  \n**输出类型**：用户输入的字符串转化为英文后的结果，并用一对连续的英文的大括号包裹。如果用户输入为空，那么输出空值。不需要加入任何前缀后缀或说明性语句，例如“以下是翻译结果”，“Below you are handling the string: ”等，直接输出用大括号包裹后的结果即可。如果你无法理解用户发送的内容，或者用户发送的内容是无意义的字符串，乱码等，你可以直接返回一个用一对大括号包裹的原始字符串。\n\n---\n\n**示例输入1**  \nThe Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.\n\n**示例输出1**  \n{{The Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.}}\n\n**示例输入2**  \n沙特阿拉伯，Mecca的酒店在线预订。良好的可用性和优惠。便宜和安全，在酒店支付，不收预订费。\n\n**示例输出2**  \n{{Online hotel booking in Mecca, Saudi Arabia. Good availability and discounts. Affordable and safe, pay at the hotel, no booking fees.}}\n\n---\n\n注意：不需要输出任何描述性语句或解释性说明，仅仅输出解析后的字符串即可。<|im_end|>\n<|im_start|>user\n下面你要处理的字符串：กิจกรรมการบริการอื่น ๆ ส่วนบุคคลซึ่งมิได้จัดประเภทไว้ในที่อื่น<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7760, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [151644, 8948, 198, 56568, 101909, 99878, 100136, 108639, 109824, 104317, 33108, 105395, 102011, 3837, 103929, 88802, 20412, 104317, 20002, 31196, 9370, 66558, 102021, 102064, 90395, 44063, 99652, 105395, 17714, 104105, 3837, 99373, 85106, 66017, 105395, 104813, 59151, 3837, 104689, 53481, 103929, 104337, 57191, 104361, 33071, 66394, 49567, 1773, 100662, 110485, 9370, 53481, 3407, 334, 31196, 31905, 334, 5122, 66558, 3837, 104560, 99885, 102064, 3837, 74763, 104560, 108464, 102064, 9370, 105063, 3837, 74763, 102410, 50647, 2303, 334, 66017, 31905, 334, 5122, 20002, 31196, 9370, 66558, 106474, 105205, 104813, 59151, 90395, 11622, 103219, 104005, 9370, 105205, 104197, 100139, 17992, 108232, 1773, 62244, 20002, 31196, 50647, 3837, 100624, 66017, 34794, 25511, 1773, 104689, 101963, 99885, 24562, 103630, 33447, 103630, 57191, 66394, 33071, 72881, 99700, 3837, 77557, 2073, 114566, 105395, 59151, 33590, 2073, 38214, 498, 525, 11589, 279, 914, 25, 18987, 49567, 3837, 101041, 66017, 11622, 26288, 100139, 17992, 108232, 104813, 59151, 104180, 1773, 102056, 101068, 101128, 20002, 72017, 104597, 3837, 100631, 20002, 72017, 104597, 20412, 42192, 100240, 9370, 66558, 3837, 100397, 16476, 49567, 3837, 105048, 101041, 31526, 46944, 11622, 103219, 26288, 100139, 17992, 108232, 9370, 105966, 66558, 3407, 44364, 334, 19793, 26355, 31196, 16, 334, 2303, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 382, 334, 19793, 26355, 66017, 16, 334, 2303, 2979, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 13, 47449, 334, 19793, 26355, 31196, 17, 334, 2303, 111662, 111946, 3837, 7823, 24441, 9370, 101078, 99107, 109545, 1773, 104205, 107769, 105178, 102289, 1773, 104698, 33108, 99464, 96050, 101078, 68262, 3837, 16530, 50009, 109545, 80268, 3407, 334, 19793, 26355, 66017, 17, 334, 2303, 2979, 19598, 9500, 21857, 304, 2157, 24441, 11, 17904, 23061, 13, 7684, 18048, 323, 31062, 13, 42506, 323, 6092, 11, 2291, 518, 279, 9500, 11, 902, 21857, 12436, 13, 47449, 44364, 60533, 5122, 104689, 66017, 99885, 53481, 33071, 72881, 99700, 57191, 104136, 33071, 66394, 3837, 102630, 66017, 106637, 104813, 66558, 104180, 1773, 151645, 198, 151644, 872, 198, 100431, 105182, 54542, 9370, 66558, 5122, 25200, 30785, 60416, 124701, 93874, 125331, 30785, 93874, 22929, 64684, 20184, 128630, 129328, 124659, 36142, 47642, 40327, 124358, 123885, 123883, 18625, 30434, 26283, 30785, 127196, 19841, 60416, 124090, 132814, 125497, 19841, 124202, 35884, 47171, 22929, 64684, 20184, 151645, 198, 151644, 77091, 198], lora_request: None, prompt_adapter_request: None.
INFO:     116.247.118.146:42270 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-12 11:34:10 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241012-113410.pkl...
WARNING 10-12 11:34:10 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 10-12 11:34:10 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 10-12 11:34:10 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 10-12 11:34:10 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 10-12 11:34:10 model_runner_base.py:143] 
[rank0]:[E1012 11:34:10.718322889 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f7fc4a4cf86 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f7fc49fbd10 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f7fc4b27f08 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f7fc5d443e6 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f7fc5d49600 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f7fc5d502ba in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f7fc5d526fc in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdbbf4 (0x7f8013500bf4 in /raid/demo/anaconda3/envs/vllm_latest/bin/../lib/libstdc++.so.6)
frame #8: <unknown function> + 0x8609 (0x7f8014f19609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f8014ce4353 in /lib/x86_64-linux-gnu/libc.so.6)

INFO:     61.171.72.231:17915 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     61.171.72.231:43655 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     61.171.72.231:32518 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     61.171.72.231:54509 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     61.171.72.231:32519 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c7304a40

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    |     async for res in result_generator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    |     item = await awaits[0]
    |            ^^^^^^^^^^^^^^^
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
    | 
    +------------------------------------
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f841c12a6c0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    |     async for res in result_generator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    |     item = await awaits[0]
    |            ^^^^^^^^^^^^^^^
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
    | 
    +------------------------------------
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c6bb7770

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    |     async for res in result_generator:
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    |     item = await awaits[0]
    |            ^^^^^^^^^^^^^^^
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    |     raise request_output
    |   [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
    | 
    +------------------------------------

### 🐛 Describe the bug

I deployed Qwen2-72B using 4 A800 GPUs. In version 0.6.2, this problem occurs from time to time.

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered #9306

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered #9306

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions