[Bug]: when curl /chat/completions, TypeError: Unable to evaluate type annotation 'Required[Union[str, Iterable[ChatCompletionContentPartTextParam]]]'.

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Collecting environment information...
WARNING 09-13 07:28:00 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead, and make sure to uninstall `pynvml`. When both of them are installed, `pynvml` will take precedence and cause errors. See https://pypi.org/project/pynvml for more information.
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A800 80GB PCIe
GPU 1: NVIDIA A800 80GB PCIe
GPU 2: NVIDIA A800 80GB PCIe
GPU 3: NVIDIA A800 80GB PCIe
GPU 4: NVIDIA A800 80GB PCIe
GPU 5: NVIDIA A800 80GB PCIe

Nvidia driver version: 535.129.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          104
On-line CPU(s) list:             0-103
Thread(s) per core:              2
Core(s) per socket:              26
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz
Stepping:                        6
Frequency boost:                 enabled
CPU MHz:                         800.000
CPU max MHz:                     3400.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4400.00
Virtualization:                  VT-x
L1d cache:                       2.4 MiB
L1i cache:                       1.6 MiB
L2 cache:                        65 MiB
L3 cache:                        78 MiB
NUMA node0 CPU(s):               0-25,52-77
NUMA node1 CPU(s):               26-51,78-103
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; Load fences, usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 invpcid_single intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq md_clear pconfig spec_ctrl intel_stibp flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.22.2
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-dali-cuda110==1.20.0
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] nvidia-pyindex==1.0.9
[pip3] onnx==1.12.0
[pip3] pynvml==11.4.1
[pip3] pytorch-quantization==2.1.2
[pip3] pyzmq==24.0.1
[pip3] torch==2.4.0
[pip3] torch-tensorrt==1.3.0a0
[pip3] torchtext==0.13.0a0+fae8e8c
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.1@3fd2b0d21cd9ec78de410fdf8aa1de840e9ad77a
vLLM Build Flags:
CUDA Archs: 5.2 6.0 6.1 7.0 7.5 8.0 8.6 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     PIX     SYS     SYS     SYS     0-25,52-77      0               N/A
GPU1    PIX      X      PIX     SYS     SYS     SYS     0-25,52-77      0               N/A
GPU2    PIX     PIX      X      SYS     SYS     SYS     0-25,52-77      0               N/A
GPU3    SYS     SYS     SYS      X      PIX     PIX     26-51,78-103    1               N/A
GPU4    SYS     SYS     SYS     PIX      X      PIX     26-51,78-103    1               N/A
GPU5    SYS     SYS     SYS     PIX     PIX      X      26-51,78-103    1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
```

</details>


### Model Input Dumps

_No response_

### 🐛 Describe the bug

I installed vllm according to the documentation with python 3.8 and cuda 11.8
```
# Install vLLM with CUDA 11.8.
export VLLM_VERSION=0.6.1
export PYTHON_VERSION=38
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Then I accessed the completions interface and everything was normal, but chat/completions reported an error

```
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "/models/Qwen1.5-32B-Chat-GPTQ-Int4",
        "prompt": "你是谁",
        "max_tokens": 512,
        "temperature": 0
    }'
 ```
 
<details>
<summary>ERROR</summary>

``` text
INFO:     172.17.0.1:47950 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 270, in _init_core_attrs
    self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 112, in _getattr_no_parents
    raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 303, in _eval_type_backport
    return _eval_type(value, globalns, localns, type_params)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 332, in _eval_type
    return typing._eval_type(  # type: ignore
  File "/usr/lib/python3.8/typing.py", line 270, in _eval_type
    return t._evaluate(globalns, localns)
  File "/usr/lib/python3.8/typing.py", line 518, in _evaluate
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 291, in app
    solved_result = await solve_dependencies(
  File "/usr/local/lib/python3.8/dist-packages/fastapi/dependencies/utils.py", line 639, in solve_dependencies
    ) = await request_body_to_args(  # body_params checked above
  File "/usr/local/lib/python3.8/dist-packages/fastapi/dependencies/utils.py", line 813, in request_body_to_args
    fields_to_extract = get_cached_model_fields(first_field.type_)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 657, in get_cached_model_fields
    return get_model_fields(model)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 284, in get_model_fields
    return [
  File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 285, in <listcomp>
    ModelField(field_info=field_info, name=name)
  File "<string>", line 6, in __init__
  File "/usr/local/lib/python3.8/dist-packages/fastapi/_compat.py", line 110, in __post_init__
    self._type_adapter: TypeAdapter[Any] = TypeAdapter(
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 257, in __init__
    self._init_core_attrs(rebuild_mocks=False)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 135, in wrapped
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
    self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/type_adapter.py", line 95, in _get_schema
    schema = gen.generate_schema(type_)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 908, in _generate_schema_inner
    return self._annotated_schema(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2028, in _annotated_schema
    schema = self._apply_annotations(source_type, annotations)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2107, in _apply_annotations
    schema = get_inner_schema(source_type)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2189, in new_handler
    schema = metadata_get_schema(source, get_inner_schema)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2185, in <lambda>
    lambda source, handler: handler(source)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
    schema = self._handler(source_type)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 2088, in inner_handler
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1029, in match_type
    return self._match_generic_type(obj, origin)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1062, in _match_generic_type
    return self._list_schema(self._get_first_arg_or_any(obj))
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 431, in _list_schema
    return core_schema.list_schema(self.generate_schema(items_type))
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1029, in match_type
    return self._match_generic_type(obj, origin)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1058, in _match_generic_type
    return self._union_schema(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1378, in _union_schema
    choices.append(self.generate_schema(arg))
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 655, in generate_schema
    schema = self._generate_schema_inner(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 929, in _generate_schema_inner
    return self.match_type(obj)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 999, in match_type
    return self._typed_dict_schema(obj, None)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_generate_schema.py", line 1487, in _typed_dict_schema
    for field_name, annotation in get_cls_type_hints_lenient(typed_dict_cls, self._types_namespace).items():
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 245, in get_cls_type_hints_lenient
    hints[name] = eval_type_lenient(value, globalns, localns)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 257, in eval_type_lenient
    return eval_type_backport(value, globalns, localns)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 279, in eval_type_backport
    return _eval_type_backport(value, globalns, localns, type_params)
  File "/usr/local/lib/python3.8/dist-packages/pydantic/_internal/_typing_extra.py", line 311, in _eval_type_backport
    raise TypeError(
TypeError: Unable to evaluate type annotation 'Required[Union[str, Iterable[ChatCompletionContentPartTextParam]]]'. If you are making use of the new typing syntax (unions using `|` since Python 3.10 or builtins subscripting since Python 3.9), you should either replace the use of new syntax with the existing `typing` constructs or install the `eval_type_backport` package.
```
</details>


### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: when curl /chat/completions, TypeError: Unable to evaluate type annotation 'Required[Union[str, Iterable[ChatCompletionContentPartTextParam]]]'. #8450

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development