Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] PP support for embedding models and update docs #9090

Merged
merged 23 commits into from
Oct 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add missing models to docs
  • Loading branch information
DarkLight1337 committed Oct 6, 2024
commit 519f69500f77b2366a2ba83ca6b60708d04fdd40
88 changes: 79 additions & 9 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ vLLM supports a variety of generative Transformer models in `HuggingFace Transfo
The following is the list of model architectures that are currently supported by vLLM.
Alongside each architecture, we include some popular models that use it.

----
Language-only Models
DarkLight1337 marked this conversation as resolved.
Show resolved Hide resolved
^^^^^^^^^^^^^^^^^^^^

Text Generation
---------------

Decoder-only Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1
Expand Down Expand Up @@ -40,6 +42,11 @@ Decoder-only Language Models
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
-
- ✅︎
* - :code:`BartForConditionalGeneration`
- ChatGLM
- :code:`facebook/bart-base`, :code:`facebook/bart-large-cnn`, etc.
-
-
* - :code:`ChatGLMModel`
- ChatGLM
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
Expand Down Expand Up @@ -259,11 +266,58 @@ Decoder-only Language Models
.. note::
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.

.. _supported_vlms:
Text Embedding
--------------

.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1

* - Architecture
- Models
- Example HuggingFace Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`Gemma2Model`
- Gemma2-based
- :code:`BAAI/bge-multilingual-gemma2`, etc.
-
- ✅︎
* - :code:`MistralModel`
- Mistral-based
- :code:`e5-mistral-7b-instruct`, etc.
-
- ✅︎

Reward Modelling
----------------

.. list-table::
:widths: 25 25 50 5 5
:header-rows: 1

* - Architecture
- Models
- Example HuggingFace Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`Qwen2ForRewardModel`
- Qwen2-based
- :code:`Qwen/Qwen2.5-Math-RM-72B`, etc.
-
- ✅︎

.. note::
As an interim measure, these models are supported via Embeddings API. See `this RFC <https://github.com/vllm-project/vllm/issues/8967>`_ for upcoming changes.

Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _supported_vlms:

Vision-to-Text Generation
-------------------------

.. list-table::
:widths: 25 25 25 25 5 5
:header-rows: 1
Expand Down Expand Up @@ -364,6 +418,27 @@ Multimodal Language Models
- :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
-
- ✅︎

| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.

.. note::
For :code:`openbmb/MiniCPM-V-2`, the official repo doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
For more details, please see: https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630

Audio-to-Text Generation
------------------------

.. list-table::
:widths: 25 25 25 25 5 5
:header-rows: 1

* - Architecture
- Models
- Modalities
- Example HuggingFace Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`UltravoxModel`
- Ultravox
- Audio\ :sup:`E+`
Expand All @@ -374,11 +449,6 @@ Multimodal Language Models
| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.

.. note::
For :code:`openbmb/MiniCPM-V-2`, the official repo doesn't work yet, so we need to use a fork (:code:`HwwwH/MiniCPM-V-2`) for now.
For more details, please see: https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630


If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
for instructions on how to implement support for your model.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/models/vlm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ vLLM provides experimental support for Vision Language Models (VLMs). See the :r
This document shows you how to run and serve these models using vLLM.

.. important::
We are actively iterating on VLM support. Expect breaking changes to VLM usage and development in upcoming releases without prior deprecation.
We are actively iterating on VLM support. See `this RFC <https://github.com/vllm-project/vllm/issues/4194>`_ for upcoming changes.

We are continuously improving user & developer experience for VLMs. Please `open an issue on GitHub <https://github.com/vllm-project/vllm/issues/new/choose>`_ if you have any feedback or feature requests.

Expand Down