Skip to content

[Usage]: Qwen2-VL keyword argument max_pixels is not a valid argument for this processor and will be ignored. #13143

Closed
@gaoshangle

Description

@gaoshangle

Your current environment

branch:qwen25vl

How would you like to use vllm

VLLM_ARGS="--limit-mm-per-prompt image=2 \
--tensor-parallel-size 1 \
--max-model-len 16384
--served-model-name Qwen2.5-VL-7B-Instruct/ \
--mm_processor_kwargs {\"max_pixels\":1000000} \
--gpu_memory_utilization 0.9 \
--model Qwen/Qwen2.5-VL-7B-Instruct/"
CUDA_VISIBLE_DEVICES=0 python3 -m vllm.entrypoints.openai.api_server ${VLLM_ARGS} --port 8000

bash run in the preceding way, the configured max_pixels is invalid in the following log


INFO 02-12 03:01:35 cuda.py:230] Using Flash Attention backend.
[W212 03:01:36.265895118 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
INFO 02-12 03:01:36 model_runner.py:1110] Starting to load model Qwen/Qwen2.5-VL-7B-autoglm-android-wechat-test-250211/...
INFO 02-12 03:01:36 config.py:2930] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.15s/it]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:02<00:01,  1.00it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.08it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.42it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.23it/s]
INFO 02-12 03:01:39 model_runner.py:1115] Loading model weights took 16.7361 GB
WARNING 02-12 03:01:41 model_runner.py:1288] Computed max_num_seqs (min(256, 10384 // 11025)) to be less than 1. Setting it to the minimum value of 1.


Keyword argument `max_pixels` is not a valid argument for this processor and will be ignored.


It looks like you are trying to rescale already rescaled images. If the input images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again.
WARNING 02-12 03:01:43 profiling.py:184] The context length (10384) of the model is too short to hold the multi-modal embeddings in the worst case (11025 tokens in total, out of which {'image': 2450, 'video': 8575} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.
INFO 02-12 03:01:45 worker.py:266] Memory profiling takes 5.11 seconds
INFO 02-12 03:01:45 worker.py:266] the current vLLM instance can use total_gpu_memory (23.65GiB) x gpu_memory_utilization (0.90) = 21.28GiB
INFO 02-12 03:01:45 worker.py:266] model weights take 16.74GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 1.38GiB; the rest of the memory reserved for KV Cache is 3.09GiB.
INFO 02-12 03:01:45 executor_base.py:108] # CUDA blocks: 3613, # CPU blocks: 4681
INFO 02-12 03:01:45 executor_base.py:113] Maximum concurrency for 10384 tokens per request: 5.57x
INFO 02-12 03:01:47 model_runner.py:1430] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:14<00:00,  2.44it/s]
INFO 02-12 03:02:02 model_runner.py:1558] Graph capturing finished in 14 secs, took 1.89 GiB
INFO 02-12 03:02:02 llm_engine.py:429] init engine (profile, create kv cache, warmup model) took 22.17 seconds
INFO 02-12 03:02:02 api_server.py:754] Using supplied chat template:
INFO 02-12 03:02:02 api_server.py:754] None
INFO 02-12 03:02:02 launcher.py:19] Available routes are:
INFO 02-12 03:02:02 launcher.py:27] Route: /openapi.json, Methods: HEAD, GET
INFO 02-12 03:02:02 launcher.py:27] Route: /docs, Methods: HEAD, GET
INFO 02-12 03:02:02 launcher.py:27] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 02-12 03:02:02 launcher.py:27] Route: /redoc, Methods: HEAD, GET
INFO 02-12 03:02:02 launcher.py:27] Route: /health, Methods: GET
INFO 02-12 03:02:02 launcher.py:27] Route: /ping, Methods: POST, GET
INFO 02-12 03:02:02 launcher.py:27] Route: /tokenize, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /detokenize, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/models, Methods: GET
INFO 02-12 03:02:02 launcher.py:27] Route: /version, Methods: GET
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /pooling, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /score, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/score, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /rerank, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v1/rerank, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /v2/rerank, Methods: POST
INFO 02-12 03:02:02 launcher.py:27] Route: /invocations, Methods: POST
INFO:     Started server process [6960]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions