@@ -12,201 +12,249 @@ Alongside each architecture, we include some popular models that use it.
12
12
Decoder-only Language Models
13
13
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14
14
.. list-table ::
15
- :widths: 25 25 50 5
15
+ :widths: 25 25 50 5 5
16
16
:header-rows: 1
17
17
18
18
* - Architecture
19
19
- Models
20
20
- Example HuggingFace Models
21
21
- :ref: `LoRA <lora >`
22
+ - :ref: `PP <distributed_serving >`
22
23
* - :code: `AquilaForCausalLM `
23
24
- Aquila, Aquila2
24
25
- :code: `BAAI/Aquila-7B `, :code: `BAAI/AquilaChat-7B `, etc.
25
26
- ✅︎
27
+ - ✅︎
26
28
* - :code: `ArcticForCausalLM `
27
29
- Arctic
28
30
- :code: `Snowflake/snowflake-arctic-base `, :code: `Snowflake/snowflake-arctic-instruct `, etc.
29
31
-
32
+ - ✅︎
30
33
* - :code: `BaiChuanForCausalLM `
31
34
- Baichuan2, Baichuan
32
35
- :code: `baichuan-inc/Baichuan2-13B-Chat `, :code: `baichuan-inc/Baichuan-7B `, etc.
33
36
- ✅︎
37
+ - ✅︎
34
38
* - :code: `BloomForCausalLM `
35
39
- BLOOM, BLOOMZ, BLOOMChat
36
40
- :code: `bigscience/bloom `, :code: `bigscience/bloomz `, etc.
37
41
-
42
+ - ✅︎
38
43
* - :code: `ChatGLMModel `
39
44
- ChatGLM
40
45
- :code: `THUDM/chatglm2-6b `, :code: `THUDM/chatglm3-6b `, etc.
41
46
- ✅︎
47
+ - ✅︎
42
48
* - :code: `CohereForCausalLM `
43
49
- Command-R
44
50
- :code: `CohereForAI/c4ai-command-r-v01 `, etc.
45
- -
51
+ - ✅︎
52
+ - ✅︎
46
53
* - :code: `DbrxForCausalLM `
47
54
- DBRX
48
55
- :code: `databricks/dbrx-base `, :code: `databricks/dbrx-instruct `, etc.
49
56
-
57
+ - ✅︎
50
58
* - :code: `DeciLMForCausalLM `
51
59
- DeciLM
52
60
- :code: `Deci/DeciLM-7B `, :code: `Deci/DeciLM-7B-instruct `, etc.
53
61
-
62
+ - ✅︎
54
63
* - :code: `DeepseekForCausalLM `
55
64
- DeepSeek
56
65
- :code: `deepseek-ai/deepseek-llm-67b-base `, :code: `deepseek-ai/deepseek-llm-7b-chat ` etc.
57
66
-
67
+ - ✅︎
58
68
* - :code: `DeepseekV2ForCausalLM `
59
69
- DeepSeek-V2
60
70
- :code: `deepseek-ai/DeepSeek-V2 `, :code: `deepseek-ai/DeepSeek-V2-Chat ` etc.
61
71
-
72
+ - ✅︎
62
73
* - :code: `ExaoneForCausalLM `
63
74
- EXAONE-3
64
75
- :code: `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct `, etc.
65
76
- ✅︎
77
+ - ✅︎
66
78
* - :code: `FalconForCausalLM `
67
79
- Falcon
68
80
- :code: `tiiuae/falcon-7b `, :code: `tiiuae/falcon-40b `, :code: `tiiuae/falcon-rw-7b `, etc.
69
81
-
82
+ - ✅︎
70
83
* - :code: `GemmaForCausalLM `
71
84
- Gemma
72
85
- :code: `google/gemma-2b `, :code: `google/gemma-7b `, etc.
73
86
- ✅︎
87
+ - ✅︎
74
88
* - :code: `Gemma2ForCausalLM `
75
89
- Gemma2
76
90
- :code: `google/gemma-2-9b `, :code: `google/gemma-2-27b `, etc.
77
91
- ✅︎
92
+ - ✅︎
78
93
* - :code: `GPT2LMHeadModel `
79
94
- GPT-2
80
95
- :code: `gpt2 `, :code: `gpt2-xl `, etc.
81
96
-
97
+ - ✅︎
82
98
* - :code: `GPTBigCodeForCausalLM `
83
99
- StarCoder, SantaCoder, WizardCoder
84
100
- :code: `bigcode/starcoder `, :code: `bigcode/gpt_bigcode-santacoder `, :code: `WizardLM/WizardCoder-15B-V1.0 `, etc.
85
101
- ✅︎
102
+ - ✅︎
86
103
* - :code: `GPTJForCausalLM `
87
104
- GPT-J
88
105
- :code: `EleutherAI/gpt-j-6b `, :code: `nomic-ai/gpt4all-j `, etc.
89
106
-
107
+ - ✅︎
90
108
* - :code: `GPTNeoXForCausalLM `
91
109
- GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
92
110
- :code: `EleutherAI/gpt-neox-20b `, :code: `EleutherAI/pythia-12b `, :code: `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 `, :code: `databricks/dolly-v2-12b `, :code: `stabilityai/stablelm-tuned-alpha-7b `, etc.
93
111
-
112
+ - ✅︎
94
113
* - :code: `GraniteForCausalLM `
95
114
- PowerLM
96
115
- :code: `ibm/PowerLM-3b ` etc.
97
116
- ✅︎
117
+ - ✅︎
98
118
* - :code: `GraniteMoeForCausalLM `
99
119
- PowerMoE
100
120
- :code: `ibm/PowerMoE-3b ` etc.
101
121
- ✅︎
122
+ - ✅︎
102
123
* - :code: `InternLMForCausalLM `
103
124
- InternLM
104
125
- :code: `internlm/internlm-7b `, :code: `internlm/internlm-chat-7b `, etc.
105
126
- ✅︎
127
+ - ✅︎
106
128
* - :code: `InternLM2ForCausalLM `
107
129
- InternLM2
108
130
- :code: `internlm/internlm2-7b `, :code: `internlm/internlm2-chat-7b `, etc.
109
131
-
132
+ - ✅︎
110
133
* - :code: `JAISLMHeadModel `
111
134
- Jais
112
135
- :code: `core42/jais-13b `, :code: `core42/jais-13b-chat `, :code: `core42/jais-30b-v3 `, :code: `core42/jais-30b-chat-v3 `, etc.
113
136
-
137
+ - ✅︎
114
138
* - :code: `JambaForCausalLM `
115
139
- Jamba
116
140
- :code: `ai21labs/AI21-Jamba-1.5-Large `, :code: `ai21labs/AI21-Jamba-1.5-Mini `, :code: `ai21labs/Jamba-v0.1 `, etc.
117
141
- ✅︎
142
+ -
118
143
* - :code: `LlamaForCausalLM `
119
144
- Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
120
145
- :code: `meta-llama/Meta-Llama-3.1-405B-Instruct `, :code: `meta-llama/Meta-Llama-3.1-70B `, :code: `meta-llama/Meta-Llama-3-70B-Instruct `, :code: `meta-llama/Llama-2-70b-hf `, :code: `01-ai/Yi-34B `, etc.
121
146
- ✅︎
147
+ - ✅︎
122
148
* - :code: `MiniCPMForCausalLM `
123
149
- MiniCPM
124
150
- :code: `openbmb/MiniCPM-2B-sft-bf16 `, :code: `openbmb/MiniCPM-2B-dpo-bf16 `, etc.
125
- -
151
+ - ✅︎
152
+ - ✅︎
126
153
* - :code: `MiniCPM3ForCausalLM `
127
154
- MiniCPM3
128
155
- :code: `openbmb/MiniCPM3-4B `, etc.
129
- -
156
+ - ✅︎
157
+ - ✅︎
130
158
* - :code: `MistralForCausalLM `
131
159
- Mistral, Mistral-Instruct
132
160
- :code: `mistralai/Mistral-7B-v0.1 `, :code: `mistralai/Mistral-7B-Instruct-v0.1 `, etc.
133
161
- ✅︎
162
+ - ✅︎
134
163
* - :code: `MixtralForCausalLM `
135
164
- Mixtral-8x7B, Mixtral-8x7B-Instruct
136
165
- :code: `mistralai/Mixtral-8x7B-v0.1 `, :code: `mistralai/Mixtral-8x7B-Instruct-v0.1 `, :code: `mistral-community/Mixtral-8x22B-v0.1 `, etc.
137
166
- ✅︎
167
+ - ✅︎
138
168
* - :code: `MPTForCausalLM `
139
169
- MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
140
170
- :code: `mosaicml/mpt-7b `, :code: `mosaicml/mpt-7b-storywriter `, :code: `mosaicml/mpt-30b `, etc.
141
171
-
172
+ - ✅︎
142
173
* - :code: `NemotronForCausalLM `
143
174
- Nemotron-3, Nemotron-4, Minitron
144
175
- :code: `nvidia/Minitron-8B-Base `, :code: `mgoin/Nemotron-4-340B-Base-hf-FP8 `, etc.
145
176
- ✅︎
146
- * - :code: `OLMoEForCausalLM `
147
- - OLMoE
148
- - :code: `allenai/OLMoE-1B-7B-0924 `, :code: `allenai/OLMoE-1B-7B-0924-Instruct `, etc.
149
- -
177
+ - ✅︎
150
178
* - :code: `OLMoForCausalLM `
151
179
- OLMo
152
180
- :code: `allenai/OLMo-1B-hf `, :code: `allenai/OLMo-7B-hf `, etc.
153
181
-
182
+ - ✅︎
183
+ * - :code: `OLMoEForCausalLM `
184
+ - OLMoE
185
+ - :code: `allenai/OLMoE-1B-7B-0924 `, :code: `allenai/OLMoE-1B-7B-0924-Instruct `, etc.
186
+ - ✅︎
187
+ - ✅︎
154
188
* - :code: `OPTForCausalLM `
155
189
- OPT, OPT-IML
156
190
- :code: `facebook/opt-66b `, :code: `facebook/opt-iml-max-30b `, etc.
157
191
-
192
+ - ✅︎
158
193
* - :code: `OrionForCausalLM `
159
194
- Orion
160
195
- :code: `OrionStarAI/Orion-14B-Base `, :code: `OrionStarAI/Orion-14B-Chat `, etc.
161
196
-
197
+ - ✅︎
162
198
* - :code: `PhiForCausalLM `
163
199
- Phi
164
200
- :code: `microsoft/phi-1_5 `, :code: `microsoft/phi-2 `, etc.
165
201
- ✅︎
202
+ - ✅︎
166
203
* - :code: `Phi3ForCausalLM `
167
204
- Phi-3
168
205
- :code: `microsoft/Phi-3-mini-4k-instruct `, :code: `microsoft/Phi-3-mini-128k-instruct `, :code: `microsoft/Phi-3-medium-128k-instruct `, etc.
169
- -
206
+ - ✅︎
207
+ - ✅︎
170
208
* - :code: `Phi3SmallForCausalLM `
171
209
- Phi-3-Small
172
210
- :code: `microsoft/Phi-3-small-8k-instruct `, :code: `microsoft/Phi-3-small-128k-instruct `, etc.
173
211
-
212
+ - ✅︎
174
213
* - :code: `PhiMoEForCausalLM `
175
214
- Phi-3.5-MoE
176
215
- :code: `microsoft/Phi-3.5-MoE-instruct `, etc.
177
- -
216
+ - ✅︎
217
+ - ✅︎
178
218
* - :code: `PersimmonForCausalLM `
179
219
- Persimmon
180
220
- :code: `adept/persimmon-8b-base `, :code: `adept/persimmon-8b-chat `, etc.
181
221
-
222
+ - ✅︎
182
223
* - :code: `QWenLMHeadModel `
183
224
- Qwen
184
225
- :code: `Qwen/Qwen-7B `, :code: `Qwen/Qwen-7B-Chat `, etc.
185
226
-
227
+ - ✅︎
186
228
* - :code: `Qwen2ForCausalLM `
187
229
- Qwen2
188
230
- :code: `Qwen/Qwen2-beta-7B `, :code: `Qwen/Qwen2-beta-7B-Chat `, etc.
189
231
- ✅︎
232
+ - ✅︎
190
233
* - :code: `Qwen2MoeForCausalLM `
191
234
- Qwen2MoE
192
235
- :code: `Qwen/Qwen1.5-MoE-A2.7B `, :code: `Qwen/Qwen1.5-MoE-A2.7B-Chat `, etc.
193
236
-
237
+ - ✅︎
194
238
* - :code: `StableLmForCausalLM `
195
239
- StableLM
196
240
- :code: `stabilityai/stablelm-3b-4e1t `, :code: `stabilityai/stablelm-base-alpha-7b-v2 `, etc.
197
241
-
242
+ - ✅︎
198
243
* - :code: `Starcoder2ForCausalLM `
199
244
- Starcoder2
200
245
- :code: `bigcode/starcoder2-3b `, :code: `bigcode/starcoder2-7b `, :code: `bigcode/starcoder2-15b `, etc.
201
246
-
247
+ - ✅︎
202
248
* - :code: `SolarForCausalLM `
203
- - EXAONE-3
249
+ - Solar Pro
204
250
- :code: `upstage/solar-pro-preview-instruct `, etc.
205
- -
251
+ - ✅︎
252
+ - ✅︎
206
253
* - :code: `XverseForCausalLM `
207
- - Xverse
254
+ - XVERSE
208
255
- :code: `xverse/XVERSE-7B-Chat `, :code: `xverse/XVERSE-13B-Chat `, :code: `xverse/XVERSE-65B-Chat `, etc.
209
- -
256
+ - ✅︎
257
+ - ✅︎
210
258
211
259
.. note ::
212
260
Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
@@ -217,94 +265,111 @@ Multimodal Language Models
217
265
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
218
266
219
267
.. list-table ::
220
- :widths: 25 25 25 25 5
268
+ :widths: 25 25 25 25 5 5
221
269
:header-rows: 1
222
270
223
271
* - Architecture
224
272
- Models
225
273
- Modalities
226
274
- Example HuggingFace Models
227
275
- :ref: `LoRA <lora >`
276
+ - :ref: `PP <distributed_serving >`
228
277
* - :code: `Blip2ForConditionalGeneration `
229
278
- BLIP-2
230
279
- Image\ :sup: `E`
231
280
- :code: `Salesforce/blip2-opt-2.7b `, :code: `Salesforce/blip2-opt-6.7b `, etc.
232
281
-
282
+ - ✅︎
233
283
* - :code: `ChameleonForConditionalGeneration `
234
284
- Chameleon
235
285
- Image
236
286
- :code: `facebook/chameleon-7b ` etc.
237
287
-
288
+ - ✅︎
238
289
* - :code: `FuyuForCausalLM `
239
290
- Fuyu
240
291
- Image
241
292
- :code: `adept/fuyu-8b ` etc.
242
293
-
294
+ - ✅︎
243
295
* - :code: `InternVLChatModel `
244
296
- InternVL2
245
297
- Image\ :sup: `E+`
246
298
- :code: `OpenGVLab/InternVL2-4B `, :code: `OpenGVLab/InternVL2-8B `, etc.
247
299
-
300
+ - ✅︎
248
301
* - :code: `LlavaForConditionalGeneration `
249
302
- LLaVA-1.5
250
303
- Image\ :sup: `E+`
251
304
- :code: `llava-hf/llava-1.5-7b-hf `, :code: `llava-hf/llava-1.5-13b-hf `, etc.
252
305
-
306
+ - ✅︎
253
307
* - :code: `LlavaNextForConditionalGeneration `
254
308
- LLaVA-NeXT
255
309
- Image\ :sup: `E+`
256
310
- :code: `llava-hf/llava-v1.6-mistral-7b-hf `, :code: `llava-hf/llava-v1.6-vicuna-7b-hf `, etc.
257
311
-
312
+ - ✅︎
258
313
* - :code: `LlavaNextVideoForConditionalGeneration `
259
314
- LLaVA-NeXT-Video
260
315
- Video
261
316
- :code: `llava-hf/LLaVA-NeXT-Video-7B-hf `, etc.
262
317
-
318
+ - ✅︎
263
319
* - :code: `LlavaOnevisionForConditionalGeneration `
264
320
- LLaVA-Onevision
265
321
- Image\ :sup: `+` / Video
266
322
- :code: `llava-hf/llava-onevision-qwen2-7b-ov-hf `, :code: `llava-hf/llava-onevision-qwen2-0.5b-ov-hf `, etc.
267
323
-
324
+ - ✅︎
268
325
* - :code: `MiniCPMV `
269
326
- MiniCPM-V
270
327
- Image\ :sup: `+`
271
328
- :code: `openbmb/MiniCPM-V-2 ` (see note), :code: `openbmb/MiniCPM-Llama3-V-2_5 `, :code: `openbmb/MiniCPM-V-2_6 `, etc.
272
- -
329
+ - ✅︎
330
+ - ✅︎
273
331
* - :code: `MllamaForConditionalGeneration `
274
332
- Llama 3.2
275
333
- Image
276
334
- :code: `meta-llama/Llama-3.2-90B-Vision-Instruct `, :code: `meta-llama/Llama-3.2-11B-Vision `, etc.
277
335
-
336
+ -
278
337
* - :code: `PaliGemmaForConditionalGeneration `
279
338
- PaliGemma
280
339
- Image\ :sup: `E`
281
340
- :code: `google/paligemma-3b-pt-224 `, :code: `google/paligemma-3b-mix-224 `, etc.
282
341
-
342
+ - ✅︎
283
343
* - :code: `Phi3VForCausalLM `
284
344
- Phi-3-Vision, Phi-3.5-Vision
285
345
- Image\ :sup: `E+`
286
346
- :code: `microsoft/Phi-3-vision-128k-instruct `, :code: `microsoft/Phi-3.5-vision-instruct ` etc.
287
347
-
348
+ - ✅︎
288
349
* - :code: `PixtralForConditionalGeneration `
289
350
- Pixtral
290
351
- Image\ :sup: `+`
291
352
- :code: `mistralai/Pixtral-12B-2409 `
292
353
-
354
+ - ✅︎
293
355
* - :code: `QWenLMHeadModel `
294
356
- Qwen-VL
295
357
- Image\ :sup: `E+`
296
358
- :code: `Qwen/Qwen-VL `, :code: `Qwen/Qwen-VL-Chat `, etc.
297
359
-
360
+ - ✅︎
298
361
* - :code: `Qwen2VLForConditionalGeneration `
299
362
- Qwen2-VL
300
363
- Image\ :sup: `E+` / Video\ :sup: `+`
301
364
- :code: `Qwen/Qwen2-VL-2B-Instruct `, :code: `Qwen/Qwen2-VL-7B-Instruct `, :code: `Qwen/Qwen2-VL-72B-Instruct `, etc.
302
365
-
366
+ - ✅︎
303
367
* - :code: `UltravoxModel `
304
368
- Ultravox
305
369
- Audio\ :sup: `E+`
306
370
- :code: `fixie-ai/ultravox-v0_3 `
307
371
-
372
+ - ✅︎
308
373
309
374
| :sup:`E` Pre-computed embeddings can be inputted for this modality.
310
375
| :sup:`+` Multiple items can be inputted per text prompt for this modality.
0 commit comments