Skip to content

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered #9306

Open
@Clint-chan

Description

@Clint-chan

Your current environment

The output of `python collect_env.py`
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31

Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB

Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      43 bits physical, 48 bits virtual
CPU(s):                             256
On-line CPU(s) list:                0-255
Thread(s) per core:                 2
Core(s) per socket:                 64
Socket(s):                          2
NUMA node(s):                       8
Vendor ID:                          AuthenticAMD
CPU family:                         23
Model:                              49
Model name:                         AMD EPYC 7742 64-Core Processor
Stepping:                           0
Frequency boost:                    enabled
CPU MHz:                            3389.731
CPU max MHz:                        2250.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4491.29
Virtualization:                     AMD-V
L1d cache:                          4 MiB
L1i cache:                          4 MiB
L2 cache:                           64 MiB
L3 cache:                           512 MiB
NUMA node0 CPU(s):                  0-15,128-143
NUMA node1 CPU(s):                  16-31,144-159
NUMA node2 CPU(s):                  32-47,160-175
NUMA node3 CPU(s):                  48-63,176-191
NUMA node4 CPU(s):                  64-79,192-207
NUMA node5 CPU(s):                  80-95,208-223
NUMA node6 CPU(s):                  96-111,224-239
NUMA node7 CPU(s):                  112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Vulnerable
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca sme sev sev_es

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.8.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.555.43
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.20
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==25.1.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.0
[pip3] triton==3.0.0
[conda] _anaconda_depends         2024.02             py311_mkl_1  
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py311h5eee18b_1  
[conda] mkl_fft                   1.3.8           py311h5eee18b_0  
[conda] mkl_random                1.2.4           py311hdb19cb5_0  
[conda] numpy                     1.26.4          py311h08b1b3b_0  
[conda] numpy-base                1.26.4          py311hf175353_0  
[conda] numpydoc                  1.5.0           py311h06a4308_0  
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-ml-py              12.555.43                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.20                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] pyzmq                     25.1.2          py311h6a678d5_0  
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] transformers              4.44.0                   pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.4@4db5176d9758b720b05460c50ace3c01026eb158

Model Input Dumps

INFO 10-12 11:34:09 logger.py:36] Received request chat-44505254559d4a72ad36a008ebbfbbdf: prompt: '<|im_start|>system\n你是一个专业且精确的语言判断和翻译工具,你的任务是判断用户输入的字符串是什么语言,并将它翻译为英语,仅需要输出翻译后的结果,不需要描述你的思路或补充性说明等。保持简洁的描述。\n\n输入类型:字符串,可能是任何语言,也可能是几种语言的混合,也有可能为空 \n输出类型:用户输入的字符串转化为英文后的结果,并用一对连续的英文的大括号包裹。如果用户输入为空,那么输出空值。不需要加入任何前缀后缀或说明性语句,例如“以下是翻译结果”,“Below you are handling the string: ”等,直接输出用大括号包裹后的结果即可。如果你无法理解用户发送的内容,或者用户发送的内容是无意义的字符串,乱码等,你可以直接返回一个用一对大括号包裹的原始字符串。\n\n---\n\n示例输入1 \nThe Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.\n\n示例输出1 \n{{The Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.}}\n\n示例输入2 \n沙特阿拉伯,Mecca的酒店在线预订。良好的可用性和优惠。便宜和安全,在酒店支付,不收预订费。\n\n示例输出2 \n{{Online hotel booking in Mecca, Saudi Arabia. Good availability and discounts. Affordable and safe, pay at the hotel, no booking fees.}}\n\n---\n\n注意:不需要输出任何描述性语句或解释性说明,仅仅输出解析后的字符串即可。<|im_end|>\n<|im_start|>user\n下面你要处理的字符串:กิจกรรมการบริการอื่น ๆ ส่วนบุคคลซึ่งมิได้จัดประเภทไว้ในที่อื่น<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7760, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [151644, 8948, 198, 56568, 101909, 99878, 100136, 108639, 109824, 104317, 33108, 105395, 102011, 3837, 103929, 88802, 20412, 104317, 20002, 31196, 9370, 66558, 102021, 102064, 90395, 44063, 99652, 105395, 17714, 104105, 3837, 99373, 85106, 66017, 105395, 104813, 59151, 3837, 104689, 53481, 103929, 104337, 57191, 104361, 33071, 66394, 49567, 1773, 100662, 110485, 9370, 53481, 3407, 334, 31196, 31905, 334, 5122, 66558, 3837, 104560, 99885, 102064, 3837, 74763, 104560, 108464, 102064, 9370, 105063, 3837, 74763, 102410, 50647, 2303, 334, 66017, 31905, 334, 5122, 20002, 31196, 9370, 66558, 106474, 105205, 104813, 59151, 90395, 11622, 103219, 104005, 9370, 105205, 104197, 100139, 17992, 108232, 1773, 62244, 20002, 31196, 50647, 3837, 100624, 66017, 34794, 25511, 1773, 104689, 101963, 99885, 24562, 103630, 33447, 103630, 57191, 66394, 33071, 72881, 99700, 3837, 77557, 2073, 114566, 105395, 59151, 33590, 2073, 38214, 498, 525, 11589, 279, 914, 25, 18987, 49567, 3837, 101041, 66017, 11622, 26288, 100139, 17992, 108232, 104813, 59151, 104180, 1773, 102056, 101068, 101128, 20002, 72017, 104597, 3837, 100631, 20002, 72017, 104597, 20412, 42192, 100240, 9370, 66558, 3837, 100397, 16476, 49567, 3837, 105048, 101041, 31526, 46944, 11622, 103219, 26288, 100139, 17992, 108232, 9370, 105966, 66558, 3407, 44364, 334, 19793, 26355, 31196, 16, 334, 2303, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 382, 334, 19793, 26355, 66017, 16, 334, 2303, 2979, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 13, 47449, 334, 19793, 26355, 31196, 17, 334, 2303, 111662, 111946, 3837, 7823, 24441, 9370, 101078, 99107, 109545, 1773, 104205, 107769, 105178, 102289, 1773, 104698, 33108, 99464, 96050, 101078, 68262, 3837, 16530, 50009, 109545, 80268, 3407, 334, 19793, 26355, 66017, 17, 334, 2303, 2979, 19598, 9500, 21857, 304, 2157, 24441, 11, 17904, 23061, 13, 7684, 18048, 323, 31062, 13, 42506, 323, 6092, 11, 2291, 518, 279, 9500, 11, 902, 21857, 12436, 13, 47449, 44364, 60533, 5122, 104689, 66017, 99885, 53481, 33071, 72881, 99700, 57191, 104136, 33071, 66394, 3837, 102630, 66017, 106637, 104813, 66558, 104180, 1773, 151645, 198, 151644, 872, 198, 100431, 105182, 54542, 9370, 66558, 5122, 25200, 30785, 60416, 124701, 93874, 125331, 30785, 93874, 22929, 64684, 20184, 128630, 129328, 124659, 36142, 47642, 40327, 124358, 123885, 123883, 18625, 30434, 26283, 30785, 127196, 19841, 60416, 124090, 132814, 125497, 19841, 124202, 35884, 47171, 22929, 64684, 20184, 151645, 198, 151644, 77091, 198], lora_request: None, prompt_adapter_request: None.
INFO: 116.247.118.146:42270 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-12 11:34:10 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241012-113410.pkl...
WARNING 10-12 11:34:10 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 10-12 11:34:10 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 10-12 11:34:10 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 10-12 11:34:10 model_runner_base.py:143] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
WARNING 10-12 11:34:10 model_runner_base.py:143]
[rank0]:[E1012 11:34:10.718322889 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f7fc4a4cf86 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f7fc49fbd10 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f7fc4b27f08 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f7fc5d443e6 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f7fc5d49600 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f7fc5d502ba in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f7fc5d526fc in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdbbf4 (0x7f8013500bf4 in /raid/demo/anaconda3/envs/vllm_latest/bin/../lib/libstdc++.so.6)
frame #8: + 0x8609 (0x7f8014f19609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f8014ce4353 in /lib/x86_64-linux-gnu/libc.so.6)

INFO: 61.171.72.231:17915 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:43655 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:32518 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:54509 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:32519 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c7304a40

During handling of the above exception, another exception occurred:

  • Exception Group Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    | result = await app( # type: ignore[func-returns-value]
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
    | return await self.app(scope, receive, send)
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
    | await super().call(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
    | await self.app(scope, receive, _send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
    | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    | await route.handle(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
    | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
    | await response(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
    | async with anyio.create_task_group() as task_group:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
    | raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    | await func()
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    | async for chunk in self.body_iterator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    | async for res in result_generator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    | item = await awaits[0]
    | ^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
    |
    +------------------------------------
    ERROR: Exception in ASGI application
    Traceback (most recent call last):
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
    await wrap(partial(self.listen_for_disconnect, receive))
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
    ^^^^^^^^^^^^^^^
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
    asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f841c12a6c0

During handling of the above exception, another exception occurred:

  • Exception Group Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    | result = await app( # type: ignore[func-returns-value]
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
    | return await self.app(scope, receive, send)
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
    | await super().call(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
    | await self.app(scope, receive, _send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
    | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    | await route.handle(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
    | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
    | await response(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
    | async with anyio.create_task_group() as task_group:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
    | raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    | await func()
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    | async for chunk in self.body_iterator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    | async for res in result_generator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    | item = await awaits[0]
    | ^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
    |
    +------------------------------------
    ERROR: Exception in ASGI application
    Traceback (most recent call last):
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
    await wrap(partial(self.listen_for_disconnect, receive))
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    await func()
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
    ^^^^^^^^^^^^^^^
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
    File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
    await fut
    asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c6bb7770

During handling of the above exception, another exception occurred:

  • Exception Group Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    | result = await app( # type: ignore[func-returns-value]
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
    | return await self.app(scope, receive, send)
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
    | await super().call(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
    | await self.app(scope, receive, _send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
    | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    | await route.handle(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    | await self.app(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
    | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    | raise exc
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    | await app(scope, receive, sender)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
    | await response(scope, receive, send)
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
    | async with anyio.create_task_group() as task_group:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
    | raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
    | await func()
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
    | async for chunk in self.body_iterator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
    | async for res in result_generator:
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
    | item = await awaits[0]
    | ^^^^^^^^^^^^^^^
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
    | raise request_output
    | [Previous line repeated 11 more times]
    | RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
    | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
    |
    +------------------------------------

🐛 Describe the bug

I deployed Qwen2-72B using 4 A800 GPUs. In version 0.6.2, this problem occurs from time to time.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingunstaleRecieved activity after being labelled stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions