Closed
Description
Your current environment
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (conda-forge gcc 14.2.0-1) 14.2.0
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17
Python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.105.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 545.23.08
cuDNN version: Probably one of the following:
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2499.976
BogoMIPS: 4999.95
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single rsb_ctxsw fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni
Versions of relevant libraries:
[pip3] facenet-pytorch==2.6.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnxruntime-gpu==1.16.3
[pip3] open_clip_torch==2.29.0
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1+cu124
[pip3] torchao==0.8.0.dev20241203+cu124
[pip3] torchaudio==2.4.0
[pip3] torchdiffeq==0.2.5
[pip3] torchmetrics==1.6.0
[pip3] torchsde==0.2.6
[pip3] torchtyping==0.1.5
[pip3] torchvision==0.20.1+cu124
[pip3] transformers==4.46.3
[pip3] triton==3.1.0
[conda] blas 1.0 mkl
[conda] cuda-cudart 11.8.89 0 nvidia
[conda] cuda-cupti 11.8.87 0 nvidia
[conda] cuda-libraries 11.8.0 0 nvidia
[conda] cuda-nvrtc 11.8.89 0 nvidia
[conda] cuda-nvtx 11.8.86 0 nvidia
[conda] cuda-runtime 11.8.0 0 nvidia
[conda] cuda-version 12.6 3 nvidia
[conda] facenet-pytorch 2.6.0 pypi_0 pypi
[conda] libcublas 11.11.3.6 0 nvidia
[conda] libcufft 10.9.0.58 0 nvidia
[conda] libcufile 1.11.1.6 0 nvidia
[conda] libcurand 10.3.7.77 0 nvidia
[conda] libcusolver 11.4.1.48 0 nvidia
[conda] libcusparse 11.7.5.86 0 nvidia
[conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
[conda] libnpp 11.8.0.86 0 nvidia
[conda] libnvjpeg 11.9.0.86 0 nvidia
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.11 py310h5eee18b_0
[conda] mkl_random 1.2.8 py310h1128e8f_0
[conda] numpy 1.26.4 py310h5f9d8c6_0
[conda] numpy-base 1.26.4 py310hb5e798b_0
[conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
[conda] nvidia-ml-py 12.560.30 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
[conda] open-clip-torch 2.29.0 pypi_0 pypi
[conda] pytorch-cuda 11.8 h7e8668a_6 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pyzmq 26.2.0 pypi_0 pypi
[conda] torch 2.5.1+cu124 pypi_0 pypi
[conda] torchao 0.8.0.dev20241203+cu124 pypi_0 pypi
[conda] torchaudio 2.5.1+cu124 pypi_0 pypi
[conda] torchdiffeq 0.2.5 pypi_0 pypi
[conda] torchmetrics 1.6.0 pypi_0 pypi
[conda] torchsde 0.2.6 pypi_0 pypi
[conda] torchtyping 0.1.5 pypi_0 pypi
[conda] torchvision 0.20.1 pypi_0 pypi
[conda] transformers 4.46.3 pypi_0 pypi
[conda] triton 3.1.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-11 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Model Input Dumps
No response
🐛 Describe the bug
import requests
import torch
from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
from PIL import Image
import requests
question = "请分别描述这几张图片。"
image_url1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image_url2 = "https://pics1.baidu.com/feed/7acb0a46f21fbe094f59a18fb5ffe03d8644ad50.jpeg@f_auto?token=6ad879f34822f7617ef2834c83a2e017"
model_id = "/home/work/forrest/github/MiniCPM-V/saved_models/base"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
llm = LLM(
model=model_id,
trust_remote_code=True,
max_model_len=2048,
gpu_memory_utilization=0.9,
max_num_seqs=5,
dtype="auto",
)
messages = [{"role": "user", "content": f"(<image>./</image>)\n{question}"}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
stop_tokens = ["<|im_end|>", "<|endoftext|>"]
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
max_tokens=1000,
temperature=0,
best_of=1,
)
def test_mix_image_input():
mm_data1 = {
"image_embeds": torch.randn(9, 64, 3584, dtype=torch.bfloat16),
"image_size_list": [
Image.open(requests.get(image_url1, stream=True).raw).convert("RGB").size
],
}
mm_data2 = {
"image_embeds": torch.randn(3, 64, 3584, dtype=torch.bfloat16),
"image_size_list": [
Image.open(requests.get(image_url2, stream=True).raw).convert("RGB").size
],
}
llm_inputs1 = {"prompt": prompt, "multi_modal_data": {"image": mm_data1}}
llm_inputs2 = {"prompt": prompt, "multi_modal_data": {"image": mm_data2}}
outputs = llm.generate(
[llm_inputs1, llm_inputs2],
sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)
print(outputs[1].outputs[0].text)
return outputs
if __name__ == "__main__":
test_mix_image_input()
This code will cause an error.
NFO 12-30 21:36:20 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241230-213620.pkl.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1654, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/model_executor/models/minicpmv.py", line 571, in forward
[rank0]: vlm_embeddings, _ = self.get_embedding(input_ids, image_inputs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/model_executor/models/minicpmv.py", line 439, in get_embedding
[rank0]: vision_hidden_states = (image_inputs["data"].type(
[rank0]: AttributeError: 'list' object has no attribute 'type'
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/work/forrest/github/dl_exp/mllm/scripts/minicpmv/example_minicpmv_vllm_demo_multi_pr.py", line 67, in <module>
[rank0]: test_mix_image_input()
[rank0]: File "/home/work/forrest/github/dl_exp/mllm/scripts/minicpmv/example_minicpmv_vllm_demo_multi_pr.py", line 57, in test_mix_image_input
[rank0]: outputs = llm.generate(
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/utils.py", line 1063, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 406, in generate
[rank0]: outputs = self._run_engine(use_tqdm=use_tqdm)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 942, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1454, in step
[rank0]: outputs = self.model_executor.execute_model(
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 125, in execute_model
[rank0]: output = self.driver_worker.execute_model(execute_model_req)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 343, in execute_model
[rank0]: output = self.model_runner.execute_model(
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/installFile/miniconda3/envs/echomimic/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]: raise type(err)(
[rank0]: AttributeError: Error in model execution (input dumped to /tmp/err_execute_model_input_20241230-213620.pkl): 'list' object has no attribute 'type'
Processed prompts: 0%| | 0/2 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]```
### Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.