Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Models] Add remaining model PP support #7168

Merged
merged 62 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4d358bb
Model PP support
andoorve Aug 5, 2024
6685431
Format
andoorve Aug 5, 2024
44f3537
Format
andoorve Aug 5, 2024
a4aeba1
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 3, 2024
aff35f9
Format
andoorve Sep 3, 2024
3632dc6
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 24, 2024
6c16347
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 24, 2024
cfcba73
Format
andoorve Sep 24, 2024
674e28f
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve Sep 27, 2024
bc7385e
Merge branch 'main' into qwen-pp
DarkLight1337 Sep 30, 2024
332d66c
fix wrong type
DarkLight1337 Sep 30, 2024
18841f3
fix typo
DarkLight1337 Sep 30, 2024
313d04c
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 1, 2024
eea3fc5
Add SupportsPP interface and stateless protocol check
DarkLight1337 Oct 1, 2024
b4ce5f7
Subclass SupportsPP in relevant models
DarkLight1337 Oct 1, 2024
30e454a
Remove hardcoded list
DarkLight1337 Oct 1, 2024
e9ea5b7
Remove unused import
DarkLight1337 Oct 1, 2024
8b40176
Check using function
DarkLight1337 Oct 1, 2024
ec4c6b3
Update docstring
DarkLight1337 Oct 1, 2024
cdc4dbe
Simplify
DarkLight1337 Oct 1, 2024
dcc2a49
Add tests
DarkLight1337 Oct 1, 2024
7280766
Test CUDA initialization
DarkLight1337 Oct 1, 2024
37cc51b
Add platform guard
DarkLight1337 Oct 1, 2024
3814246
Trigger CI
DarkLight1337 Oct 1, 2024
cf91f7b
Fix OOT registration
DarkLight1337 Oct 1, 2024
38b090a
Update docstring
DarkLight1337 Oct 1, 2024
d394985
Remove unnecessary global
DarkLight1337 Oct 1, 2024
1404e92
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 2, 2024
6a4287a
Update interfaces
DarkLight1337 Oct 3, 2024
1e010c7
format
DarkLight1337 Oct 3, 2024
a6b99c3
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
1e0baba
Fix error check
DarkLight1337 Oct 3, 2024
9ef69de
Make `prefix` required
DarkLight1337 Oct 3, 2024
76355d9
Inherit from `SupportsPP`
DarkLight1337 Oct 3, 2024
7be7ac2
Merge branch 'main' into qwen-pp
DarkLight1337 Oct 3, 2024
c3f3d4a
Inherit from `SupportsPP`
DarkLight1337 Oct 3, 2024
9cc78ae
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
7c2a922
Fix PP for language models
DarkLight1337 Oct 3, 2024
a36f7ed
Fix environment variables not being copied over
DarkLight1337 Oct 3, 2024
5b960bc
Merge branch 'main' into supports-pp
DarkLight1337 Oct 3, 2024
66a634e
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
f9cae12
Use inferred type
DarkLight1337 Oct 3, 2024
591bf85
Add missing type annotations
DarkLight1337 Oct 3, 2024
addc8cd
Add PP support for more multimodal models
DarkLight1337 Oct 3, 2024
ed669a5
Fix the real problem, which is that modelscope is not installed
DarkLight1337 Oct 3, 2024
9ac8a99
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
d211003
Update tests
DarkLight1337 Oct 3, 2024
beb609c
Update docs
DarkLight1337 Oct 3, 2024
4cb66d5
Fix missing `SupportsPP`; support PP for olmoe
DarkLight1337 Oct 3, 2024
e01d59f
Fix type annotations
DarkLight1337 Oct 3, 2024
b8958a9
Move modelscope installation into regression test
DarkLight1337 Oct 3, 2024
ec0f4e0
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
8fdcaa0
format
DarkLight1337 Oct 3, 2024
3ed8a8b
Update LoRA support in docs
DarkLight1337 Oct 3, 2024
ba174b6
PP support for phimoe
DarkLight1337 Oct 3, 2024
5da7ff1
Fix capitalization
DarkLight1337 Oct 3, 2024
e9f0601
Fix `LLMWrapper`
DarkLight1337 Oct 3, 2024
fda3b66
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 Oct 3, 2024
b65813c
Update test configs
DarkLight1337 Oct 3, 2024
99e653e
Add more tp to pixtral
andoorve Oct 3, 2024
62f1980
Update test_pipeline_parallel.py
andoorve Oct 3, 2024
7c7251e
Fix gpt_j.py
andoorve Oct 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ pytest-shard
awscli
einops # required for MPT, qwen-vl and Mamba
httpx
librosa # required for audio test
opencv-python # required for video test
modelscope # required for modelscope tests
librosa # required for audio tests
opencv-python # required for video tests
peft
requests
ray[adag]==2.35
Expand Down
4 changes: 1 addition & 3 deletions vllm/model_executor/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import importlib
import os
import string
import subprocess
import sys
Expand Down Expand Up @@ -273,8 +272,7 @@ def _check_stateless(
])

result = subprocess.run([sys.executable, "-c", stmts],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many times will it run, when we serve a model?

creating a subprocess can be expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the result is cached, it will only be run at most 2 * len(architectures) per model (if both pp and multimodal are checked).

Copy link
Member

@DarkLight1337 DarkLight1337 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The startup time of the processes should be short compared to the time it takes to load the model weights.

capture_output=True,
env=os.environ.copy())
capture_output=True)

if result.returncode != 0:
err_lines = [line.decode() for line in result.stderr.splitlines()]
Expand Down