Description
Your current environment
The output of `python collect_env.py`
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.31
Python version: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Nvidia driver version: 535.129.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7742 64-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 3389.731
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
BogoMIPS: 4491.29
Virtualization: AMD-V
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 64 MiB
L3 cache: 512 MiB
NUMA node0 CPU(s): 0-15,128-143
NUMA node1 CPU(s): 16-31,144-159
NUMA node2 CPU(s): 32-47,160-175
NUMA node3 CPU(s): 48-63,176-191
NUMA node4 CPU(s): 64-79,192-207
NUMA node5 CPU(s): 80-95,208-223
NUMA node6 CPU(s): 96-111,224-239
NUMA node7 CPU(s): 112-127,240-255
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Vulnerable
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca sme sev sev_es
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.8.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.555.43
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.20
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==25.1.2
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.0
[pip3] triton==3.0.0
[conda] _anaconda_depends 2024.02 py311_mkl_1
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py311h5eee18b_1
[conda] mkl_fft 1.3.8 py311h5eee18b_0
[conda] mkl_random 1.2.4 py311hdb19cb5_0
[conda] numpy 1.26.4 py311h08b1b3b_0
[conda] numpy-base 1.26.4 py311hf175353_0
[conda] numpydoc 1.5.0 py311h06a4308_0
[conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
[conda] nvidia-ml-py 12.555.43 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.20 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
[conda] pyzmq 25.1.2 py311h6a678d5_0
[conda] torch 2.4.0 pypi_0 pypi
[conda] torchvision 0.19.0 pypi_0 pypi
[conda] transformers 4.44.0 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.4@4db5176d9758b720b05460c50ace3c01026eb158
Model Input Dumps
INFO 10-12 11:34:09 logger.py:36] Received request chat-44505254559d4a72ad36a008ebbfbbdf: prompt: '<|im_start|>system\n你是一个专业且精确的语言判断和翻译工具,你的任务是判断用户输入的字符串是什么语言,并将它翻译为英语,仅需要输出翻译后的结果,不需要描述你的思路或补充性说明等。保持简洁的描述。\n\n输入类型:字符串,可能是任何语言,也可能是几种语言的混合,也有可能为空 \n输出类型:用户输入的字符串转化为英文后的结果,并用一对连续的英文的大括号包裹。如果用户输入为空,那么输出空值。不需要加入任何前缀后缀或说明性语句,例如“以下是翻译结果”,“Below you are handling the string: ”等,直接输出用大括号包裹后的结果即可。如果你无法理解用户发送的内容,或者用户发送的内容是无意义的字符串,乱码等,你可以直接返回一个用一对大括号包裹的原始字符串。\n\n---\n\n示例输入1 \nThe Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.\n\n示例输出1 \n{{The Seitai Shinpo Acupuncture Foundation is a non-profit organization whose mission is to foster the development and training of expert teachers and proficient practitioners so that they may perpetuate the wisdom of Seitai Shinpo.}}\n\n示例输入2 \n沙特阿拉伯,Mecca的酒店在线预订。良好的可用性和优惠。便宜和安全,在酒店支付,不收预订费。\n\n示例输出2 \n{{Online hotel booking in Mecca, Saudi Arabia. Good availability and discounts. Affordable and safe, pay at the hotel, no booking fees.}}\n\n---\n\n注意:不需要输出任何描述性语句或解释性说明,仅仅输出解析后的字符串即可。<|im_end|>\n<|im_start|>user\n下面你要处理的字符串:กิจกรรมการบริการอื่น ๆ ส่วนบุคคลซึ่งมิได้จัดประเภทไว้ในที่อื่น<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7760, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [151644, 8948, 198, 56568, 101909, 99878, 100136, 108639, 109824, 104317, 33108, 105395, 102011, 3837, 103929, 88802, 20412, 104317, 20002, 31196, 9370, 66558, 102021, 102064, 90395, 44063, 99652, 105395, 17714, 104105, 3837, 99373, 85106, 66017, 105395, 104813, 59151, 3837, 104689, 53481, 103929, 104337, 57191, 104361, 33071, 66394, 49567, 1773, 100662, 110485, 9370, 53481, 3407, 334, 31196, 31905, 334, 5122, 66558, 3837, 104560, 99885, 102064, 3837, 74763, 104560, 108464, 102064, 9370, 105063, 3837, 74763, 102410, 50647, 2303, 334, 66017, 31905, 334, 5122, 20002, 31196, 9370, 66558, 106474, 105205, 104813, 59151, 90395, 11622, 103219, 104005, 9370, 105205, 104197, 100139, 17992, 108232, 1773, 62244, 20002, 31196, 50647, 3837, 100624, 66017, 34794, 25511, 1773, 104689, 101963, 99885, 24562, 103630, 33447, 103630, 57191, 66394, 33071, 72881, 99700, 3837, 77557, 2073, 114566, 105395, 59151, 33590, 2073, 38214, 498, 525, 11589, 279, 914, 25, 18987, 49567, 3837, 101041, 66017, 11622, 26288, 100139, 17992, 108232, 104813, 59151, 104180, 1773, 102056, 101068, 101128, 20002, 72017, 104597, 3837, 100631, 20002, 72017, 104597, 20412, 42192, 100240, 9370, 66558, 3837, 100397, 16476, 49567, 3837, 105048, 101041, 31526, 46944, 11622, 103219, 26288, 100139, 17992, 108232, 9370, 105966, 66558, 3407, 44364, 334, 19793, 26355, 31196, 16, 334, 2303, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 382, 334, 19793, 26355, 66017, 16, 334, 2303, 2979, 785, 96520, 2143, 34449, 5368, 6381, 68462, 5007, 374, 264, 2477, 27826, 7321, 6693, 8954, 374, 311, 29987, 279, 4401, 323, 4862, 315, 6203, 13336, 323, 68265, 42095, 773, 429, 807, 1231, 21585, 6292, 279, 23389, 315, 96520, 2143, 34449, 5368, 13, 47449, 334, 19793, 26355, 31196, 17, 334, 2303, 111662, 111946, 3837, 7823, 24441, 9370, 101078, 99107, 109545, 1773, 104205, 107769, 105178, 102289, 1773, 104698, 33108, 99464, 96050, 101078, 68262, 3837, 16530, 50009, 109545, 80268, 3407, 334, 19793, 26355, 66017, 17, 334, 2303, 2979, 19598, 9500, 21857, 304, 2157, 24441, 11, 17904, 23061, 13, 7684, 18048, 323, 31062, 13, 42506, 323, 6092, 11, 2291, 518, 279, 9500, 11, 902, 21857, 12436, 13, 47449, 44364, 60533, 5122, 104689, 66017, 99885, 53481, 33071, 72881, 99700, 57191, 104136, 33071, 66394, 3837, 102630, 66017, 106637, 104813, 66558, 104180, 1773, 151645, 198, 151644, 872, 198, 100431, 105182, 54542, 9370, 66558, 5122, 25200, 30785, 60416, 124701, 93874, 125331, 30785, 93874, 22929, 64684, 20184, 128630, 129328, 124659, 36142, 47642, 40327, 124358, 123885, 123883, 18625, 30434, 26283, 30785, 127196, 19841, 60416, 124090, 132814, 125497, 19841, 124202, 35884, 47171, 22929, 64684, 20184, 151645, 198, 151644, 77091, 198], lora_request: None, prompt_adapter_request: None.
INFO: 116.247.118.146:42270 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 10-12 11:34:10 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241012-113410.pkl...
WARNING 10-12 11:34:10 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 10-12 11:34:10 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 10-12 11:34:10 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 10-12 11:34:10 model_runner_base.py:143] Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
WARNING 10-12 11:34:10 model_runner_base.py:143]
[rank0]:[E1012 11:34:10.718322889 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f7fc4a4cf86 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f7fc49fbd10 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f7fc4b27f08 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f7fc5d443e6 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f7fc5d49600 in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f7fc5d502ba in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f7fc5d526fc in /raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdbbf4 (0x7f8013500bf4 in /raid/demo/anaconda3/envs/vllm_latest/bin/../lib/libstdc++.so.6)
frame #8: + 0x8609 (0x7f8014f19609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x43 (0x7f8014ce4353 in /lib/x86_64-linux-gnu/libc.so.6)
INFO: 61.171.72.231:17915 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:43655 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:32518 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:54509 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 61.171.72.231:32519 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c7304a40
During handling of the above exception, another exception occurred:
- Exception Group Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
| async with anyio.create_task_group() as task_group:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
| async for res in result_generator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
| item = await awaits[0]
| ^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| [Previous line repeated 11 more times]
| RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1
| Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
|
+------------------------------------
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f841c12a6c0
During handling of the above exception, another exception occurred:
- Exception Group Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
| async with anyio.create_task_group() as task_group:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
| async for res in result_generator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
| item = await awaits[0]
| ^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| [Previous line repeated 11 more times]
| RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1
| Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
|
+------------------------------------
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 257, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 230, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f83c6bb7770
During handling of the above exception, another exception occurred:
- Exception Group Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/applications.py", line 113, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
| raise exc
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
| await app(scope, receive, sender)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 250, in call
| async with anyio.create_task_group() as task_group:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 736, in aexit
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 253, in wrap
| await func()
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/starlette/responses.py", line 242, in stream_response
| async for chunk in self.body_iterator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 309, in chat_completion_stream_generator
| async for res in result_generator:
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/utils.py", line 452, in iterate_with_cancellation
| item = await awaits[0]
| ^^^^^^^^^^^^^^^
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| File "/raid/demo/anaconda3/envs/vllm_latest/lib/python3.12/site-packages/vllm/engine/multiprocessing/client.py", line 486, in _process_request
| raise request_output
| [Previous line repeated 11 more times]
| RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1
| Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
|
+------------------------------------
🐛 Describe the bug
I deployed Qwen2-72B using 4 A800 GPUs. In version 0.6.2, this problem occurs from time to time.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.