-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Models] Add remaining model PP support #7168
Merged
Merged
Changes from 2 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
4d358bb
Model PP support
andoorve 6685431
Format
andoorve 44f3537
Format
andoorve a4aeba1
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve aff35f9
Format
andoorve 3632dc6
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve 6c16347
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve cfcba73
Format
andoorve 674e28f
Merge branch 'main' of github.com:andoorve/vllm into qwen-pp
andoorve bc7385e
Merge branch 'main' into qwen-pp
DarkLight1337 332d66c
fix wrong type
DarkLight1337 18841f3
fix typo
DarkLight1337 313d04c
Merge branch 'main' into qwen-pp
DarkLight1337 eea3fc5
Add SupportsPP interface and stateless protocol check
DarkLight1337 b4ce5f7
Subclass SupportsPP in relevant models
DarkLight1337 30e454a
Remove hardcoded list
DarkLight1337 e9ea5b7
Remove unused import
DarkLight1337 8b40176
Check using function
DarkLight1337 ec4c6b3
Update docstring
DarkLight1337 cdc4dbe
Simplify
DarkLight1337 dcc2a49
Add tests
DarkLight1337 7280766
Test CUDA initialization
DarkLight1337 37cc51b
Add platform guard
DarkLight1337 3814246
Trigger CI
DarkLight1337 cf91f7b
Fix OOT registration
DarkLight1337 38b090a
Update docstring
DarkLight1337 d394985
Remove unnecessary global
DarkLight1337 1404e92
Merge branch 'main' into qwen-pp
DarkLight1337 6a4287a
Update interfaces
DarkLight1337 1e010c7
format
DarkLight1337 a6b99c3
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 1e0baba
Fix error check
DarkLight1337 9ef69de
Make `prefix` required
DarkLight1337 76355d9
Inherit from `SupportsPP`
DarkLight1337 7be7ac2
Merge branch 'main' into qwen-pp
DarkLight1337 c3f3d4a
Inherit from `SupportsPP`
DarkLight1337 9cc78ae
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 7c2a922
Fix PP for language models
DarkLight1337 a36f7ed
Fix environment variables not being copied over
DarkLight1337 5b960bc
Merge branch 'main' into supports-pp
DarkLight1337 66a634e
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 f9cae12
Use inferred type
DarkLight1337 591bf85
Add missing type annotations
DarkLight1337 addc8cd
Add PP support for more multimodal models
DarkLight1337 ed669a5
Fix the real problem, which is that modelscope is not installed
DarkLight1337 9ac8a99
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 d211003
Update tests
DarkLight1337 beb609c
Update docs
DarkLight1337 4cb66d5
Fix missing `SupportsPP`; support PP for olmoe
DarkLight1337 e01d59f
Fix type annotations
DarkLight1337 b8958a9
Move modelscope installation into regression test
DarkLight1337 ec0f4e0
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 8fdcaa0
format
DarkLight1337 3ed8a8b
Update LoRA support in docs
DarkLight1337 ba174b6
PP support for phimoe
DarkLight1337 5da7ff1
Fix capitalization
DarkLight1337 e9f0601
Fix `LLMWrapper`
DarkLight1337 fda3b66
Merge branch 'supports-pp' into qwen-pp
DarkLight1337 b65813c
Update test configs
DarkLight1337 99e653e
Add more tp to pixtral
andoorve 62f1980
Update test_pipeline_parallel.py
andoorve 7c7251e
Fix gpt_j.py
andoorve File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many times will it run, when we serve a model?
creating a subprocess can be expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the result is cached, it will only be run at most
2 * len(architectures)
per model (if both pp and multimodal are checked).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The startup time of the processes should be short compared to the time it takes to load the model weights.