Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Models] Add remaining model PP support #7168

Merged
merged 62 commits into from
Oct 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4d358bb
Model PP support
andoorve Aug 5, 2024
6685431
Format
andoorve Aug 5, 2024
44f3537
Format
andoorve Aug 5, 2024
a4aeba1
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 3, 2024
aff35f9
Format
andoorve Sep 3, 2024
3632dc6
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 24, 2024
6c16347
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 24, 2024
cfcba73
Format
andoorve Sep 24, 2024
674e28f
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 27, 2024
bc7385e
Merge branch 'main' into qwen-pp
DarkLight1337 Sep 30, 2024
332d66c
fix wrong type
DarkLight1337 Sep 30, 2024
18841f3
fix typo
DarkLight1337 Sep 30, 2024
313d04c
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 1, 2024
eea3fc5
Add SupportsPP interface and stateless protocol check
DarkLight1337 Oct 1, 2024
b4ce5f7
Subclass SupportsPP in relevant models
DarkLight1337 Oct 1, 2024
30e454a
Remove hardcoded list
DarkLight1337 Oct 1, 2024
e9ea5b7
Remove unused import
DarkLight1337 Oct 1, 2024
8b40176
Check using function
DarkLight1337 Oct 1, 2024
ec4c6b3
Update docstring
DarkLight1337 Oct 1, 2024
cdc4dbe
Simplify
DarkLight1337 Oct 1, 2024
dcc2a49
Add tests
DarkLight1337 Oct 1, 2024
7280766
Test CUDA initialization
DarkLight1337 Oct 1, 2024
37cc51b
Add platform guard
DarkLight1337 Oct 1, 2024
3814246
Trigger CI
DarkLight1337 Oct 1, 2024
cf91f7b
Fix OOT registration
DarkLight1337 Oct 1, 2024
38b090a
Update docstring
DarkLight1337 Oct 1, 2024
d394985
Remove unnecessary global
DarkLight1337 Oct 1, 2024
1404e92
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 2, 2024
6a4287a
Update interfaces
DarkLight1337 Oct 3, 2024
1e010c7
format
DarkLight1337 Oct 3, 2024
a6b99c3
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
1e0baba
Fix error check
DarkLight1337 Oct 3, 2024
9ef69de
Make `prefix` required
DarkLight1337 Oct 3, 2024
76355d9
Inherit from `SupportsPP`
DarkLight1337 Oct 3, 2024
7be7ac2
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 3, 2024
c3f3d4a
Inherit from `SupportsPP`
DarkLight1337 Oct 3, 2024
9cc78ae
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
7c2a922
Fix PP for language models
DarkLight1337 Oct 3, 2024
a36f7ed
Fix environment variables not being copied over
DarkLight1337 Oct 3, 2024
5b960bc
Merge branch 'main' into supports-pp
DarkLight1337 Oct 3, 2024
66a634e
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
f9cae12
Use inferred type
DarkLight1337 Oct 3, 2024
591bf85
Add missing type annotations
DarkLight1337 Oct 3, 2024
addc8cd
Add PP support for more multimodal models
DarkLight1337 Oct 3, 2024
ed669a5
Fix the real problem, which is that modelscope is not installed
DarkLight1337 Oct 3, 2024
9ac8a99
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
d211003
Update tests
DarkLight1337 Oct 3, 2024
beb609c
Update docs
DarkLight1337 Oct 3, 2024
4cb66d5
Fix missing `SupportsPP`; support PP for olmoe
DarkLight1337 Oct 3, 2024
e01d59f
Fix type annotations
DarkLight1337 Oct 3, 2024
b8958a9
Move modelscope installation into regression test
DarkLight1337 Oct 3, 2024
ec0f4e0
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
8fdcaa0
format
DarkLight1337 Oct 3, 2024
3ed8a8b
Update LoRA support in docs
DarkLight1337 Oct 3, 2024
ba174b6
PP support for phimoe
DarkLight1337 Oct 3, 2024
5da7ff1
Fix capitalization
DarkLight1337 Oct 3, 2024
e9f0601
Fix `LLMWrapper`
DarkLight1337 Oct 3, 2024
fda3b66
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
b65813c
Update test configs
DarkLight1337 Oct 3, 2024
99e653e
Add more tp to pixtral
andoorve Oct 3, 2024
62f1980
Update test_pipeline_parallel.py
andoorve Oct 3, 2024
7c7251e
Fix gpt_j.py
andoorve Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update docs
  • Loading branch information
DarkLight1337 committed Oct 3, 2024
commit beb609cb4d4296d1d00ab0a5cd7e168ddbedbaf1
73 changes: 69 additions & 4 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,201 +12,249 @@ Alongside each architecture, we include some popular models that use it.
Decoder-only Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:widths: 25 25 50 5
:widths: 25 25 50 5 5
:header-rows: 1

* - Architecture
- Models
- Example HuggingFace Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`AquilaForCausalLM`
- Aquila, Aquila2
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
- ✅︎
- ✅︎
* - :code:`ArcticForCausalLM`
- Arctic
- :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
-
- ✅︎
* - :code:`BaiChuanForCausalLM`
- Baichuan2, Baichuan
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
- ✅︎
- ✅︎
* - :code:`BloomForCausalLM`
- BLOOM, BLOOMZ, BLOOMChat
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
-
- ✅︎
* - :code:`ChatGLMModel`
- ChatGLM
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
- ✅︎
- ✅︎
* - :code:`CohereForCausalLM`
- Command-R
- :code:`CohereForAI/c4ai-command-r-v01`, etc.
-
- ✅︎
* - :code:`DbrxForCausalLM`
- DBRX
- :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
-
- ✅︎
* - :code:`DeciLMForCausalLM`
- DeciLM
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
-
- ✅︎
* - :code:`DeepseekForCausalLM`
- DeepSeek
- :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
-
- ✅︎
* - :code:`DeepseekV2ForCausalLM`
- DeepSeek-V2
- :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
-
- ✅︎
* - :code:`ExaoneForCausalLM`
- EXAONE-3
- :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
- ✅︎
-
* - :code:`FalconForCausalLM`
- Falcon
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
-
- ✅︎
* - :code:`GemmaForCausalLM`
- Gemma
- :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
- ✅︎
- ✅︎
* - :code:`Gemma2ForCausalLM`
- Gemma2
- :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
- ✅︎
- ✅︎
* - :code:`GPT2LMHeadModel`
- GPT-2
- :code:`gpt2`, :code:`gpt2-xl`, etc.
-
- ✅︎
* - :code:`GPTBigCodeForCausalLM`
- StarCoder, SantaCoder, WizardCoder
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
- ✅︎
- ✅︎
* - :code:`GPTJForCausalLM`
- GPT-J
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
-
- ✅︎
* - :code:`GPTNeoXForCausalLM`
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
-
- ✅︎
* - :code:`GraniteForCausalLM`
- PowerLM
- :code:`ibm/PowerLM-3b` etc.
- ✅︎
- ✅︎
* - :code:`GraniteMoeForCausalLM`
- PowerMoE
- :code:`ibm/PowerMoE-3b` etc.
- ✅︎
- ✅︎
* - :code:`InternLMForCausalLM`
- InternLM
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
- ✅︎
- ✅︎
* - :code:`InternLM2ForCausalLM`
- InternLM2
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
-
- ✅︎
* - :code:`JAISLMHeadModel`
- Jais
- :code:`core42/jais-13b`, :code:`core42/jais-13b-chat`, :code:`core42/jais-30b-v3`, :code:`core42/jais-30b-chat-v3`, etc.
-
- ✅︎
* - :code:`JambaForCausalLM`
- Jamba
- :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
- ✅︎
-
* - :code:`LlamaForCausalLM`
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
- :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
- ✅︎
- ✅︎
* - :code:`MiniCPMForCausalLM`
- MiniCPM
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, etc.
-
- ✅︎
* - :code:`MiniCPM3ForCausalLM`
- MiniCPM3
- :code:`openbmb/MiniCPM3-4B`, etc.
-
- ✅︎
* - :code:`MistralForCausalLM`
- Mistral, Mistral-Instruct
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
- ✅︎
- ✅︎
* - :code:`MixtralForCausalLM`
- Mixtral-8x7B, Mixtral-8x7B-Instruct
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
- ✅︎
- ✅︎
* - :code:`MPTForCausalLM`
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
-
- ✅︎
* - :code:`NemotronForCausalLM`
- Nemotron-3, Nemotron-4, Minitron
- :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
- ✅︎
- ✅︎
* - :code:`OLMoEForCausalLM`
- OLMoE
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
-
-
* - :code:`OLMoForCausalLM`
- OLMo
- :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
-
- ✅︎
* - :code:`OPTForCausalLM`
- OPT, OPT-IML
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
-
- ✅︎
* - :code:`OrionForCausalLM`
- Orion
- :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
-
- ✅︎
* - :code:`PhiForCausalLM`
- Phi
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
- ✅︎
- ✅︎
* - :code:`Phi3ForCausalLM`
- Phi-3
- :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
-
- ✅︎
* - :code:`Phi3SmallForCausalLM`
- Phi-3-Small
- :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
-
- ✅︎
* - :code:`PhiMoEForCausalLM`
- Phi-3.5-MoE
- :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
-
-
* - :code:`PersimmonForCausalLM`
- Persimmon
- :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
-
- ✅︎
* - :code:`QWenLMHeadModel`
- Qwen
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
-
- ✅︎
* - :code:`Qwen2ForCausalLM`
- Qwen2
- :code:`Qwen/Qwen2-beta-7B`, :code:`Qwen/Qwen2-beta-7B-Chat`, etc.
- ✅︎
- ✅︎
* - :code:`Qwen2MoeForCausalLM`
- Qwen2MoE
- :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
-
- ✅︎
* - :code:`StableLmForCausalLM`
- StableLM
- :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
-
- ✅︎
* - :code:`Starcoder2ForCausalLM`
- Starcoder2
- :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
-
- ✅︎
* - :code:`SolarForCausalLM`
- EXAONE-3
- Solar Pro
- :code:`upstage/solar-pro-preview-instruct`, etc.
-
-
* - :code:`XverseForCausalLM`
- Xverse
- :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
-
- ✅︎

.. note::
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
Expand All @@ -217,94 +265,111 @@ Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. list-table::
:widths: 25 25 25 25 5
:widths: 25 25 25 25 5 5
:header-rows: 1

* - Architecture
- Models
- Modalities
- Example HuggingFace Models
- :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>`
* - :code:`Blip2ForConditionalGeneration`
- BLIP-2
- Image\ :sup:`E`
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
-
- ✅︎
* - :code:`ChameleonForConditionalGeneration`
- Chameleon
- Image
- :code:`facebook/chameleon-7b` etc.
-
- ✅︎
* - :code:`FuyuForCausalLM`
- Fuyu
- Image
- :code:`adept/fuyu-8b` etc.
-
- ✅︎
* - :code:`InternVLChatModel`
- InternVL2
- Image\ :sup:`E+`
- :code:`OpenGVLab/InternVL2-4B`, :code:`OpenGVLab/InternVL2-8B`, etc.
-
- ✅︎
* - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5
- Image\ :sup:`E+`
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`llava-hf/llava-1.5-13b-hf`, etc.
-
- ✅︎
* - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT
- Image\ :sup:`E+`
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
-
- ✅︎
* - :code:`LlavaNextVideoForConditionalGeneration`
- LLaVA-NeXT-Video
- Video
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
-
- ✅︎
* - :code:`LlavaOnevisionForConditionalGeneration`
- LLaVA-Onevision
- Image\ :sup:`+` / Video
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
-
- ✅︎
* - :code:`MiniCPMV`
- MiniCPM-V
- Image\ :sup:`+`
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
-
- ✅︎
- ✅︎
* - :code:`MllamaForConditionalGeneration`
- Llama 3.2
- Image
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
-
-
* - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma
- Image\ :sup:`E`
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
-
- ✅︎
* - :code:`Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision
- Image\ :sup:`E+`
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
-
- ✅︎
* - :code:`PixtralForConditionalGeneration`
- Pixtral
- Image\ :sup:`+`
- :code:`mistralai/Pixtral-12B-2409`
-
- ✅︎
* - :code:`QWenLMHeadModel`
- Qwen-VL
- Image\ :sup:`E+`
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
-
- ✅︎
* - :code:`Qwen2VLForConditionalGeneration`
- Qwen2-VL
- Image\ :sup:`E+` / Video\ :sup:`+`
- :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
-
- ✅︎
* - :code:`UltravoxModel`
- Ultravox
- Audio\ :sup:`E+`
- :code:`fixie-ai/ultravox-v0_3`
-
- ✅︎

| :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality.
Expand Down
Loading