Skip to content

Commit f97e0eb

Browse files
andoorveDarkLight1337
authored andcommitted
[Models] Add remaining model PP support (vllm-project#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Amit Garg <mitgarg17495@gmail.com>
1 parent ff70294 commit f97e0eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+2575
-1336
lines changed

.buildkite/test-pipeline.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,9 @@ steps:
146146
source_file_dependencies:
147147
- vllm/
148148
- tests/test_regression
149-
command: pytest -v -s test_regression.py
149+
commands:
150+
- pip install modelscope
151+
- pytest -v -s test_regression.py
150152
working_dir: "/vllm-workspace/tests" # optional
151153

152154
- label: Engine Test # 10min

docs/source/models/supported_models.rst

+81-16
Original file line numberDiff line numberDiff line change
@@ -12,201 +12,249 @@ Alongside each architecture, we include some popular models that use it.
1212
Decoder-only Language Models
1313
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1414
.. list-table::
15-
:widths: 25 25 50 5
15+
:widths: 25 25 50 5 5
1616
:header-rows: 1
1717

1818
* - Architecture
1919
- Models
2020
- Example HuggingFace Models
2121
- :ref:`LoRA <lora>`
22+
- :ref:`PP <distributed_serving>`
2223
* - :code:`AquilaForCausalLM`
2324
- Aquila, Aquila2
2425
- :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
2526
- ✅︎
27+
- ✅︎
2628
* - :code:`ArcticForCausalLM`
2729
- Arctic
2830
- :code:`Snowflake/snowflake-arctic-base`, :code:`Snowflake/snowflake-arctic-instruct`, etc.
2931
-
32+
- ✅︎
3033
* - :code:`BaiChuanForCausalLM`
3134
- Baichuan2, Baichuan
3235
- :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
3336
- ✅︎
37+
- ✅︎
3438
* - :code:`BloomForCausalLM`
3539
- BLOOM, BLOOMZ, BLOOMChat
3640
- :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
3741
-
42+
- ✅︎
3843
* - :code:`ChatGLMModel`
3944
- ChatGLM
4045
- :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
4146
- ✅︎
47+
- ✅︎
4248
* - :code:`CohereForCausalLM`
4349
- Command-R
4450
- :code:`CohereForAI/c4ai-command-r-v01`, etc.
45-
-
51+
- ✅︎
52+
- ✅︎
4653
* - :code:`DbrxForCausalLM`
4754
- DBRX
4855
- :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
4956
-
57+
- ✅︎
5058
* - :code:`DeciLMForCausalLM`
5159
- DeciLM
5260
- :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
5361
-
62+
- ✅︎
5463
* - :code:`DeepseekForCausalLM`
5564
- DeepSeek
5665
- :code:`deepseek-ai/deepseek-llm-67b-base`, :code:`deepseek-ai/deepseek-llm-7b-chat` etc.
5766
-
67+
- ✅︎
5868
* - :code:`DeepseekV2ForCausalLM`
5969
- DeepSeek-V2
6070
- :code:`deepseek-ai/DeepSeek-V2`, :code:`deepseek-ai/DeepSeek-V2-Chat` etc.
6171
-
72+
- ✅︎
6273
* - :code:`ExaoneForCausalLM`
6374
- EXAONE-3
6475
- :code:`LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct`, etc.
6576
- ✅︎
77+
- ✅︎
6678
* - :code:`FalconForCausalLM`
6779
- Falcon
6880
- :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
6981
-
82+
- ✅︎
7083
* - :code:`GemmaForCausalLM`
7184
- Gemma
7285
- :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
7386
- ✅︎
87+
- ✅︎
7488
* - :code:`Gemma2ForCausalLM`
7589
- Gemma2
7690
- :code:`google/gemma-2-9b`, :code:`google/gemma-2-27b`, etc.
7791
- ✅︎
92+
- ✅︎
7893
* - :code:`GPT2LMHeadModel`
7994
- GPT-2
8095
- :code:`gpt2`, :code:`gpt2-xl`, etc.
8196
-
97+
- ✅︎
8298
* - :code:`GPTBigCodeForCausalLM`
8399
- StarCoder, SantaCoder, WizardCoder
84100
- :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
85101
- ✅︎
102+
- ✅︎
86103
* - :code:`GPTJForCausalLM`
87104
- GPT-J
88105
- :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
89106
-
107+
- ✅︎
90108
* - :code:`GPTNeoXForCausalLM`
91109
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
92110
- :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
93111
-
112+
- ✅︎
94113
* - :code:`GraniteForCausalLM`
95114
- PowerLM
96115
- :code:`ibm/PowerLM-3b` etc.
97116
- ✅︎
117+
- ✅︎
98118
* - :code:`GraniteMoeForCausalLM`
99119
- PowerMoE
100120
- :code:`ibm/PowerMoE-3b` etc.
101121
- ✅︎
122+
- ✅︎
102123
* - :code:`InternLMForCausalLM`
103124
- InternLM
104125
- :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
105126
- ✅︎
127+
- ✅︎
106128
* - :code:`InternLM2ForCausalLM`
107129
- InternLM2
108130
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
109131
-
132+
- ✅︎
110133
* - :code:`JAISLMHeadModel`
111134
- Jais
112135
- :code:`core42/jais-13b`, :code:`core42/jais-13b-chat`, :code:`core42/jais-30b-v3`, :code:`core42/jais-30b-chat-v3`, etc.
113136
-
137+
- ✅︎
114138
* - :code:`JambaForCausalLM`
115139
- Jamba
116140
- :code:`ai21labs/AI21-Jamba-1.5-Large`, :code:`ai21labs/AI21-Jamba-1.5-Mini`, :code:`ai21labs/Jamba-v0.1`, etc.
117141
- ✅︎
142+
-
118143
* - :code:`LlamaForCausalLM`
119144
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
120145
- :code:`meta-llama/Meta-Llama-3.1-405B-Instruct`, :code:`meta-llama/Meta-Llama-3.1-70B`, :code:`meta-llama/Meta-Llama-3-70B-Instruct`, :code:`meta-llama/Llama-2-70b-hf`, :code:`01-ai/Yi-34B`, etc.
121146
- ✅︎
147+
- ✅︎
122148
* - :code:`MiniCPMForCausalLM`
123149
- MiniCPM
124150
- :code:`openbmb/MiniCPM-2B-sft-bf16`, :code:`openbmb/MiniCPM-2B-dpo-bf16`, etc.
125-
-
151+
- ✅︎
152+
- ✅︎
126153
* - :code:`MiniCPM3ForCausalLM`
127154
- MiniCPM3
128155
- :code:`openbmb/MiniCPM3-4B`, etc.
129-
-
156+
- ✅︎
157+
- ✅︎
130158
* - :code:`MistralForCausalLM`
131159
- Mistral, Mistral-Instruct
132160
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
133161
- ✅︎
162+
- ✅︎
134163
* - :code:`MixtralForCausalLM`
135164
- Mixtral-8x7B, Mixtral-8x7B-Instruct
136165
- :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, :code:`mistral-community/Mixtral-8x22B-v0.1`, etc.
137166
- ✅︎
167+
- ✅︎
138168
* - :code:`MPTForCausalLM`
139169
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
140170
- :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
141171
-
172+
- ✅︎
142173
* - :code:`NemotronForCausalLM`
143174
- Nemotron-3, Nemotron-4, Minitron
144175
- :code:`nvidia/Minitron-8B-Base`, :code:`mgoin/Nemotron-4-340B-Base-hf-FP8`, etc.
145176
- ✅︎
146-
* - :code:`OLMoEForCausalLM`
147-
- OLMoE
148-
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
149-
-
177+
- ✅︎
150178
* - :code:`OLMoForCausalLM`
151179
- OLMo
152180
- :code:`allenai/OLMo-1B-hf`, :code:`allenai/OLMo-7B-hf`, etc.
153181
-
182+
- ✅︎
183+
* - :code:`OLMoEForCausalLM`
184+
- OLMoE
185+
- :code:`allenai/OLMoE-1B-7B-0924`, :code:`allenai/OLMoE-1B-7B-0924-Instruct`, etc.
186+
- ✅︎
187+
- ✅︎
154188
* - :code:`OPTForCausalLM`
155189
- OPT, OPT-IML
156190
- :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
157191
-
192+
- ✅︎
158193
* - :code:`OrionForCausalLM`
159194
- Orion
160195
- :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
161196
-
197+
- ✅︎
162198
* - :code:`PhiForCausalLM`
163199
- Phi
164200
- :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
165201
- ✅︎
202+
- ✅︎
166203
* - :code:`Phi3ForCausalLM`
167204
- Phi-3
168205
- :code:`microsoft/Phi-3-mini-4k-instruct`, :code:`microsoft/Phi-3-mini-128k-instruct`, :code:`microsoft/Phi-3-medium-128k-instruct`, etc.
169-
-
206+
- ✅︎
207+
- ✅︎
170208
* - :code:`Phi3SmallForCausalLM`
171209
- Phi-3-Small
172210
- :code:`microsoft/Phi-3-small-8k-instruct`, :code:`microsoft/Phi-3-small-128k-instruct`, etc.
173211
-
212+
- ✅︎
174213
* - :code:`PhiMoEForCausalLM`
175214
- Phi-3.5-MoE
176215
- :code:`microsoft/Phi-3.5-MoE-instruct`, etc.
177-
-
216+
- ✅︎
217+
- ✅︎
178218
* - :code:`PersimmonForCausalLM`
179219
- Persimmon
180220
- :code:`adept/persimmon-8b-base`, :code:`adept/persimmon-8b-chat`, etc.
181221
-
222+
- ✅︎
182223
* - :code:`QWenLMHeadModel`
183224
- Qwen
184225
- :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
185226
-
227+
- ✅︎
186228
* - :code:`Qwen2ForCausalLM`
187229
- Qwen2
188230
- :code:`Qwen/Qwen2-beta-7B`, :code:`Qwen/Qwen2-beta-7B-Chat`, etc.
189231
- ✅︎
232+
- ✅︎
190233
* - :code:`Qwen2MoeForCausalLM`
191234
- Qwen2MoE
192235
- :code:`Qwen/Qwen1.5-MoE-A2.7B`, :code:`Qwen/Qwen1.5-MoE-A2.7B-Chat`, etc.
193236
-
237+
- ✅︎
194238
* - :code:`StableLmForCausalLM`
195239
- StableLM
196240
- :code:`stabilityai/stablelm-3b-4e1t`, :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
197241
-
242+
- ✅︎
198243
* - :code:`Starcoder2ForCausalLM`
199244
- Starcoder2
200245
- :code:`bigcode/starcoder2-3b`, :code:`bigcode/starcoder2-7b`, :code:`bigcode/starcoder2-15b`, etc.
201246
-
247+
- ✅︎
202248
* - :code:`SolarForCausalLM`
203-
- EXAONE-3
249+
- Solar Pro
204250
- :code:`upstage/solar-pro-preview-instruct`, etc.
205-
-
251+
- ✅︎
252+
- ✅︎
206253
* - :code:`XverseForCausalLM`
207-
- Xverse
254+
- XVERSE
208255
- :code:`xverse/XVERSE-7B-Chat`, :code:`xverse/XVERSE-13B-Chat`, :code:`xverse/XVERSE-65B-Chat`, etc.
209-
-
256+
- ✅︎
257+
- ✅︎
210258

211259
.. note::
212260
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
@@ -217,94 +265,111 @@ Multimodal Language Models
217265
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
218266

219267
.. list-table::
220-
:widths: 25 25 25 25 5
268+
:widths: 25 25 25 25 5 5
221269
:header-rows: 1
222270

223271
* - Architecture
224272
- Models
225273
- Modalities
226274
- Example HuggingFace Models
227275
- :ref:`LoRA <lora>`
276+
- :ref:`PP <distributed_serving>`
228277
* - :code:`Blip2ForConditionalGeneration`
229278
- BLIP-2
230279
- Image\ :sup:`E`
231280
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
232281
-
282+
- ✅︎
233283
* - :code:`ChameleonForConditionalGeneration`
234284
- Chameleon
235285
- Image
236286
- :code:`facebook/chameleon-7b` etc.
237287
-
288+
- ✅︎
238289
* - :code:`FuyuForCausalLM`
239290
- Fuyu
240291
- Image
241292
- :code:`adept/fuyu-8b` etc.
242293
-
294+
- ✅︎
243295
* - :code:`InternVLChatModel`
244296
- InternVL2
245297
- Image\ :sup:`E+`
246298
- :code:`OpenGVLab/InternVL2-4B`, :code:`OpenGVLab/InternVL2-8B`, etc.
247299
-
300+
- ✅︎
248301
* - :code:`LlavaForConditionalGeneration`
249302
- LLaVA-1.5
250303
- Image\ :sup:`E+`
251304
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`llava-hf/llava-1.5-13b-hf`, etc.
252305
-
306+
- ✅︎
253307
* - :code:`LlavaNextForConditionalGeneration`
254308
- LLaVA-NeXT
255309
- Image\ :sup:`E+`
256310
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
257311
-
312+
- ✅︎
258313
* - :code:`LlavaNextVideoForConditionalGeneration`
259314
- LLaVA-NeXT-Video
260315
- Video
261316
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
262317
-
318+
- ✅︎
263319
* - :code:`LlavaOnevisionForConditionalGeneration`
264320
- LLaVA-Onevision
265321
- Image\ :sup:`+` / Video
266322
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
267323
-
324+
- ✅︎
268325
* - :code:`MiniCPMV`
269326
- MiniCPM-V
270327
- Image\ :sup:`+`
271328
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
272-
-
329+
- ✅︎
330+
- ✅︎
273331
* - :code:`MllamaForConditionalGeneration`
274332
- Llama 3.2
275333
- Image
276334
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
277335
-
336+
-
278337
* - :code:`PaliGemmaForConditionalGeneration`
279338
- PaliGemma
280339
- Image\ :sup:`E`
281340
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
282341
-
342+
- ✅︎
283343
* - :code:`Phi3VForCausalLM`
284344
- Phi-3-Vision, Phi-3.5-Vision
285345
- Image\ :sup:`E+`
286346
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
287347
-
348+
- ✅︎
288349
* - :code:`PixtralForConditionalGeneration`
289350
- Pixtral
290351
- Image\ :sup:`+`
291352
- :code:`mistralai/Pixtral-12B-2409`
292353
-
354+
- ✅︎
293355
* - :code:`QWenLMHeadModel`
294356
- Qwen-VL
295357
- Image\ :sup:`E+`
296358
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
297359
-
360+
- ✅︎
298361
* - :code:`Qwen2VLForConditionalGeneration`
299362
- Qwen2-VL
300363
- Image\ :sup:`E+` / Video\ :sup:`+`
301364
- :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
302365
-
366+
- ✅︎
303367
* - :code:`UltravoxModel`
304368
- Ultravox
305369
- Audio\ :sup:`E+`
306370
- :code:`fixie-ai/ultravox-v0_3`
307371
-
372+
- ✅︎
308373

309374
| :sup:`E` Pre-computed embeddings can be inputted for this modality.
310375
| :sup:`+` Multiple items can be inputted per text prompt for this modality.

requirements-test.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ pytest-shard
1010
awscli
1111
einops # required for MPT, qwen-vl and Mamba
1212
httpx
13-
librosa # required for audio test
14-
opencv-python # required for video test
13+
librosa # required for audio tests
14+
opencv-python # required for video tests
1515
peft
1616
requests
1717
ray[adag]==2.35

0 commit comments

Comments
 (0)