Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: Qwen2-VL #8139

Closed
1 task done
krevas opened this issue Sep 4, 2024 · 12 comments · Fixed by #7905
Closed
1 task done

[New Model]: Qwen2-VL #8139

krevas opened this issue Sep 4, 2024 · 12 comments · Fixed by #7905
Labels
new model Requests to new models

Comments

@krevas
Copy link

krevas commented Sep 4, 2024

The model to consider.

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

The closest model vllm already supports.

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/qwen2.py

What's your difficulty of supporting the model you want?

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@krevas krevas added the new model Requests to new models label Sep 4, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 4, 2024

It will be released once transformers support it.

@devonthomas35
Copy link

devonthomas35 commented Sep 4, 2024

@DarkLight1337 it is supported in transformers: https://huggingface.co/docs/transformers/main/en/model_doc/qwen2_vl#qwen2vl

@DarkLight1337
Copy link
Member

I mean we need to wait until they release a new version with the change. It is not in v4.44.2.

@Andcircle
Copy link

Andcircle commented Oct 4, 2024

@DarkLight1337
I saw a demo for qwen2-vl like:

# Specifically, we are conducting a trial run of Qwen2VL with the new input format, as the model utilizes additional parameters for calculating positional encoding.
image_embeds = torch.load(...) # torch.Tensor of shape (1, image_feature_size, hidden_size of LM)
image_grid_thw = torch.load(...) # torch.Tensor of shape (1, 3)
mm_data['image'] = {
    "image_embeds": image_embeds,
    "image_grid_thw":  image_grid_thw,
}
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": mm_data,
})

but where should I get the image_embeds dynamically?

Thanks for help

@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 4, 2024

@DarkLight1337 I saw a demo for qwen2-vl like:

# Specifically, we are conducting a trial run of Qwen2VL with the new input format, as the model utilizes additional parameters for calculating positional encoding.
image_embeds = torch.load(...) # torch.Tensor of shape (1, image_feature_size, hidden_size of LM)
image_grid_thw = torch.load(...) # torch.Tensor of shape (1, 3)
mm_data['image'] = {
    "image_embeds": image_embeds,
    "image_grid_thw":  image_grid_thw,
}
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": mm_data,
})

but where should I get the image_embeds dynamically?

Thanks for help

This code is designed for precomputed embedding inputs. You can get the embeddings by running just the Qwen2-VL visual encoder + projection on images/videos (outside of vLLM) to get their visual token embeddings. If you have a mechanism to cache the embeddings of particular input images/videos, this can speed up inference as you don't need to run the visual encoder again. Most users won't be using this though.

@Andcircle
Copy link

@DarkLight1337 I saw a demo for qwen2-vl like:

# Specifically, we are conducting a trial run of Qwen2VL with the new input format, as the model utilizes additional parameters for calculating positional encoding.
image_embeds = torch.load(...) # torch.Tensor of shape (1, image_feature_size, hidden_size of LM)
image_grid_thw = torch.load(...) # torch.Tensor of shape (1, 3)
mm_data['image'] = {
    "image_embeds": image_embeds,
    "image_grid_thw":  image_grid_thw,
}
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": mm_data,
})

but where should I get the image_embeds dynamically?
Thanks for help

This code is designed for precomputed embedding inputs. You can get the embeddings by running just the Qwen2-VL visual encoder + projection on images/videos (outside of vLLM) to get their visual token embeddings. If you have a mechanism to cache the embeddings of particular input images/videos, this can speed up inference as you don't need to run the visual encoder again. Most users won't be using this though.

Thanks for the detailed explanation

@yonghenglh6
Copy link

Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 134, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 811, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/config.py", line 207, in init
self.max_model_len = _get_and_verify_max_len(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/config.py", line 1746, in _get_and_verify_max_len
assert "factor" in rope_scaling
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

@DarkLight1337
Copy link
Member

Process SpawnProcess-1: Traceback (most recent call last): File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 134, in from_engine_args engine_config = engine_args.create_engine_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 874, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 811, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/config.py", line 207, in init self.max_model_len = _get_and_verify_max_len( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/fangzhou/miniconda3/envs/vllm/lib/python3.11/site-packages/vllm/config.py", line 1746, in _get_and_verify_max_len assert "factor" in rope_scaling ^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError

Please install vLLM from source as mentioned in #7905.

@marathon110
Copy link

/config.py", line 1746, in _get_and_verify_max_len assert "factor" in rope_scaling ^^^^^^^^^^^^^^^^^^^^^^

image Using the source code branch is:https://github.com/fyabc/vllm/tree/add_qwen2_vl_new

The loading model is:Qwen/Qwen2-VL-7B-Instruct

There is still an error:
vllm-1 | File "/workspace/vllm/entrypoints/openai/api_server.py", line 132, in build_async_engine_client_from_engine_args
vllm-1 | if (model_is_embedding(engine_args.model, engine_args.trust_remote_code,
vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | File "/workspace/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding
vllm-1 | return ModelConfig(model=model_name,
vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | File "/workspace/vllm/config.py", line 227, in init
vllm-1 | self.max_model_len = _get_and_verify_max_len(
vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | File "/workspace/vllm/config.py", line 1739, in _get_and_verify_max_len
vllm-1 | assert "factor" in rope_scaling
vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | AssertionError

@DarkLight1337
Copy link
Member

/config.py", line 1746, in _get_and_verify_max_len assert "factor" in rope_scaling ^^^^^^^^^^^^^^^^^^^^^^

image Using the source code branch is:https://github.com/fyabc/vllm/tree/add_qwen2_vl_new
The loading model is:Qwen/Qwen2-VL-7B-Instruct

There is still an error: vllm-1 | File "/workspace/vllm/entrypoints/openai/api_server.py", line 132, in build_async_engine_client_from_engine_args vllm-1 | if (model_is_embedding(engine_args.model, engine_args.trust_remote_code, vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm-1 | File "/workspace/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding vllm-1 | return ModelConfig(model=model_name, vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vllm-1 | File "/workspace/vllm/config.py", line 227, in init vllm-1 | self.max_model_len = _get_and_verify_max_len( vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^ vllm-1 | File "/workspace/vllm/config.py", line 1739, in _get_and_verify_max_len vllm-1 | assert "factor" in rope_scaling vllm-1 | ^^^^^^^^^^^^^^^^^^^^^^^^ vllm-1 | AssertionError

Please use the latest version of vLLM. It supports Qwen2-VL without this error.

@umie0128
Copy link

@DarkLight1337
我用一张高速公路抓拍图片测试 提问是:详细描述图片提供的信息 模型的输出会出现明显的截断,请问怎么处理

@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 23, 2024

@DarkLight1337 我用一张高速公路抓拍图片测试 提问是:详细描述图片提供的信息 模型的输出会出现明显的截断,请问怎么处理

You may increase max_model_len to increase the context length of the model. SamplingParams.max_tokens to increase the maximum number of tokens to output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants