Closed
Description
Your current environment
vLLM Version: 0.6.3.post2.dev256+g4be3a451
The output of `python collect_env.py`
Collecting environment information... INFO 11-06 09:39:21 importing.py:15] Triton not installed or not compatible; certain GPU-related functions will not be available. PyTorch version: 2.4.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.30.5
Libc version: glibc-2.35
Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.25-nvidia-gpu-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 17
On-line CPU(s) list: 0-16
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
CPU family: 6
Model: 106
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 17
Stepping: 6
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid md_clear arch_capabilities
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 544 KiB (17 instances)
L1i cache: 544 KiB (17 instances)
L2 cache: 68 MiB (17 instances)
L3 cache: 272 MiB (17 instances)
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] intel_extension_for_pytorch==2.4.0
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0+cpu
[pip3] torchvision==0.19.0+cpu
[pip3] transformers==4.46.2
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.3.post2.dev256+g4be3a451
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
Model Input Dumps
No response
🐛 Describe the bug
Reproduce using curl
curl -X POST "http://39.105.21.95:12481/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"messages": [
{
"role": "user",
"content": "tell me a common saying"
},
{
"role": "assistant",
"content": "Here is a common saying about apple. An apple a day, keeps"
}
],
"add_generation_prompt": false, "chat_template_kwargs":{"continue_final_message": true}
}'
Internal Server Error%
Error message on server
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 338, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 140, in create_chat_completion
) = await self._preprocess_chat(
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_engine.py", line 460, in _preprocess_chat
request_prompt = apply_hf_chat_template(
TypeError: vllm.entrypoints.chat_utils.apply_hf_chat_template() got multiple values for keyword argument 'continue_final_message'
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.