Closed
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
WARNING 09-13 07:28:00 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead, and make sure to uninstall `pynvml`. When both of them are installed, `pynvml` will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A800 80GB PCIe
GPU 1: NVIDIA A800 80GB PCIe
GPU 2: NVIDIA A800 80GB PCIe
GPU 3: NVIDIA A800 80GB PCIe
GPU 4: NVIDIA A800 80GB PCIe
GPU 5: NVIDIA A800 80GB PCIe
Nvidia driver version: 535.129.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 104
On-line CPU(s) list: 0-103
Thread(s) per core: 2
Core(s) per socket: 26
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz
Stepping: 6
Frequency boost: enabled
CPU MHz: 800.000
CPU max MHz: 3400.0000
CPU min MHz: 800.0000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 2.4 MiB
L1i cache: 1.6 MiB
L2 cache: 65 MiB
L3 cache: 78 MiB
NUMA node0 CPU(s): 0-25,52-77
NUMA node1 CPU(s): 26-51,78-103
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; Load fences, usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 invpcid_single intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq md_clear pconfig spec_ctrl intel_stibp flush_l1d arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.22.2
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-dali-cuda110==1.20.0
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] nvidia-pyindex==1.0.9
[pip3] onnx==1.12.0
[pip3] pynvml==11.4.1
[pip3] pytorch-quantization==2.1.2
[pip3] pyzmq==24.0.1
[pip3] torch==2.4.0
[pip3] torch-tensorrt==1.3.0a0
[pip3] torchtext==0.13.0a0+fae8e8c
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.1@3fd2b0d21cd9ec78de410fdf8aa1de840e9ad77a
vLLM Build Flags:
CUDA Archs: 5.2 6.0 6.1 7.0 7.5 8.0 8.6 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX PIX SYS SYS SYS 0-25,52-77 0 N/A
GPU1 PIX X PIX SYS SYS SYS 0-25,52-77 0 N/A
GPU2 PIX PIX X SYS SYS SYS 0-25,52-77 0 N/A
GPU3 SYS SYS SYS X PIX PIX 26-51,78-103 1 N/A
GPU4 SYS SYS SYS PIX X PIX 26-51,78-103 1 N/A
GPU5 SYS SYS SYS PIX PIX X 26-51,78-103 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Model Input Dumps
No response
🐛 Describe the bug
I installed vllm according to the documentation with python 3.8 and cuda 11.8
# Install vLLM with CUDA 11.8.
export VLLM_VERSION=0.6.1
export PYTHON_VERSION=38
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
Then I accessed the completions interface and everything was normal, but chat/completions reported an error
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/models/Qwen1.5-32B-Chat-GPTQ-Int4",
"prompt": "你是谁",
"max_tokens": 512,
"temperature": 0
}'
ERROR
INFO: 172.17.0.1:47950 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 270, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 112, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 303, in _eval_type_backport
return _eval_type(value, globalns, localns, type_params)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 332, in _eval_type
return typing._eval_type( # type: ignore
File "/usr/lib/python3.8/typing.py", line 270, in _eval_type
return t._evaluate(globalns, localns)
File "/usr/lib/python3.8/typing.py", line 518, in _evaluate
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 291, in app
solved_result = await solve_dependencies(
File "/usr/local/lib/python3.8/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
) = await request_body_to_args( # body_params checked above
File "/usr/local/lib/python3.8/dist-packages/fastapi/dependencies/utils.py", line 813, in request_body_to_args
fields_to_extract = get_cached_model_fields(first_field.type_)
File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 657, in get_cached_model_fields
return get_model_fields(model)
File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 284, in get_model_fields
return [
File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 285, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 110, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 257, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 135, in wrapped
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 95, in _get_schema
schema = gen.generate_schema(type_)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 908, in _generate_schema_inner
return self._annotated_schema(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2028, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2107, in _apply_annotations
schema = get_inner_schema(source_type)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2189, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2185, in <lambda>
lambda source, handler: handler(source)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2088, in inner_handler
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1029, in match_type
return self._match_generic_type(obj, origin)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1062, in _match_generic_type
return self._list_schema(self._get_first_arg_or_any(obj))
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 431, in _list_schema
return core_schema.list_schema(self.generate_schema(items_type))
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1029, in match_type
return self._match_generic_type(obj, origin)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1058, in _match_generic_type
return self._union_schema(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1378, in _union_schema
choices.append(self.generate_schema(arg))
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
schema = self._generate_schema_inner(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
return self.match_type(obj)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 999, in match_type
return self._typed_dict_schema(obj, None)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1487, in _typed_dict_schema
for field_name, annotation in get_cls_type_hints_lenient(typed_dict_cls, self._types_namespace).items():
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 245, in get_cls_type_hints_lenient
hints[name] = eval_type_lenient(value, globalns, localns)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 257, in eval_type_lenient
return eval_type_backport(value, globalns, localns)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 279, in eval_type_backport
return _eval_type_backport(value, globalns, localns, type_params)
File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 311, in _eval_type_backport
raise TypeError(
TypeError: Unable to evaluate type annotation 'Required[Union[str, Iterable[ChatCompletionContentPartTextParam]]]'. If you are making use of the new typing syntax (unions using `|` since Python 3.10 or builtins subscripting since Python 3.9), you should either replace the use of new syntax with the existing `typing` constructs or install the `eval_type_backport` package.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Activity