Closed
Description
The model to consider.
https://huggingface.co/deepseek-ai/deepseek-vl2
https://huggingface.co/deepseek-ai/deepseek-vl2-small
https://huggingface.co/deepseek-ai/deepseek-vl2-tiny
The closest model vllm already supports.
DeepSeekV2 is the base language model, so that should already be supported. From what I can tell the new vision support is simply siglip with an mlp projector. https://huggingface.co/deepseek-ai/deepseek-vl2/blob/e6adb2bce35b94ecc84fbb46d130ce60a7bb4d43/config.json#L129-L144
What's your difficulty of supporting the model you want?
I think should be similar to supporting any other Llava-style model, we should have all the pieces implemented and just need a new model definition. However support has not landed in transformers yet.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.