-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[Model] add colqwen2_vl code & inference #14291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: BloomBerry <jyjang1090@gmail.com>
|
Thanks for implementing this! Can you update the following files as well?
|
Signed-off-by: BloomBerry <jyjang1090@gmail.com>
Signed-off-by: BloomBerry <jyjang1090@gmail.com>
Signed-off-by: BloomBerry <jyjang1090@gmail.com>
Signed-off-by: BloomBerry <jyjang1090@gmail.com>
|
Hey @BloomBerry I'm working on reviving this PR since it has drifted away from the refactors on main and needs some more testing. Would you want me to push to this PR myself or I can start a new one. It seems to require this Transformers PR huggingface/transformers#35778 |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi, is there an estimation when the PR will be merged? |
|
Is anyone working on this? |
|
I was able to serve ColQwen 2.5 vl 3B (https://huggingface.co/Metric-AI/ColQwen2.5-3b-multilingual-v1.0) with vllm by doing some modifications to the source code. The idea is to use the Qwen 2.5 VL with ALL pooling type so it outputs all embedding vectors for late interaction. Here is a git patch you can apply on vllm source code (tested with v0.11.0). I am using it with the local weights of You just need to change the architecture name in the config.json from [
{
"idx": 0,
"name": "0",
"path": "",
"type": "sentence_transformers.models.Transformer"
}
]I am running the openai compatible server in docker compose as follows: entrypoint: ["vllm", "serve"]
command:
- "/root/.cache/huggingface/hub/models--Metric-AI--ColQwen2.5-3b-multilingual-v1.0/snapshots/e2a1c05d053dcf4ad6e39b6c48ced9d6a81071f0"
- "--host"
- "0.0.0.0"
- "--port"
- "8000"
- "--runner"
- "pooling"
- "--convert"
- "embed"
- "--dtype"
- "bfloat16"
- "--max-model-len"
- "1024"
- "--gpu-memory-utilization"
- "0.8"
- "--trust-remote-code"
- "--quantization"
- "bitsandbytes"
- "--override-pooler-config"
- '{"pooling_type":"ALL","normalize":true}'
- "--served-model-name"
- "anyname"It is working well with high throughput on a 8GB GPU. Hope it helps. |
|
Does your patch support multimodal (image) embedding ? |
@HoangTung-Vu Yes indeed. You should follow the same query strucuture as copali-engine: However, you cannot use openai client code because it does not support multimodal embedding. |
|
I already used requests directly instead of OpenAI client code but i encountered 400 Bad Request Error. If i comment out the image part, it works |
|
@HoangTung-Vu I need more context to understand why it happened to you. Could you tell me exactly the steps you did, and the whole message error? |
|
I applied your patch using Git commands, but it raised some errors, so I manually integrated the changes instead. For the model, I cloned OpenGVLab/colqwen2_5-3b-base, added the modules.json file as in your implementation, and updated the model class in config.json. However, when sending a request to the model, I still receive a 400 Bad Request response. |
|
@HoangTung-Vu Make sure that vllm is loading the correct model. It happened to me that it loaded a default model because it could not load the local one. I did install the docker version for my specific hardware so it was faster. Here is my docker compose for an RTX 3070: |
|
I ran my tests on a cloud instance from Vast.ai. Since it is a virtual container environment, I was not able to use Docker Compose as in your setup. For the model (ColQwen), I cloned it directly from Hugging Face. I chose the base model so that I could edit the model_class field in config.json. The fine-tuned variants only include adapter configurations, so they were not suitable for this purpose. When running vLLM, I pointed directly to the local model directory, so I assume it correctly loaded the intended model. Regarding vLLM itself, I installed it from source using: I suspect that the 400 Bad Request error might be caused by an incorrect configuration of the ColQwen model on my side. I’ll review the model setup again to ensure it matches your patch specifications. |
|
@HoangTung-Vu |
|
I have rechecked the configuration and reinstalled everything. |
|
What does your base64 URL look like? Make sure it is in the correct format |
|
@HoangTung-Vu |
Add support for ColQwen2VL model
Description
This PR adds support for the ColQwen2VL model to vLLM. ColQwen2VL is an efficient document retrieval vision language model based on Qwen2VL, as described in the paper "ColPali: Efficient Document Retrieval with Vision Language Models". The model is designed to generate embeddings rather than text outputs, making it suitable for document retrieval applications.
Key implementation details:
Extended the existing Qwen2VL implementation for ColQwen2VL compatibility
Implemented custom text projection layer and L2 normalization for embedding generation
Added appropriate processing utilities for image and video inputs
Overrode forward, compute_logits and sample methods to optimize for embedding output
This implementation enables users to leverage ColQwen2VL's multimodal document retrieval capabilities through vLLM's efficient serving infrastructure.
Testing
Tested with sample image inputs
Verified embedding output format and dimensions
Confirmed compatibility with HuggingFace ColQwen2VL models
FIX #19381