-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Description
Your current environment
vLLM version: 0.10.1.1
Model: https://huggingface.co/ibm-granite/granite-docling-258M
Discussion: https://huggingface.co/ibm-granite/granite-docling-258M/discussions/20
How would you like to use vllm
I want to serve the granite-docling (https://huggingface.co/ibm-granite/granite-docling-258M) with vLLM. We're running into some issues in trying to do this, specifically:
When the 'tie_word_embeddings' param is set in the config.json for the model, it is not inherited correctly at inference/model loading time for the granite-docling model (which is a Idefics3 model). We are encountering errors related to missing wte.weight parameters.
Current placeholder fix: the docling team has provided a different set of weights for vllm, in a different branch, which has the parameters replicated
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.