-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Support embedding models in V1 #16188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
98 commits
Select commit
Hold shift + click to select a range
f36c4f9
Remove guardrails that prevent V1 from trying to run embedding models
maxdebayser acf4638
hack v1 flash_attn to support encoder_only
maxdebayser b13bbc0
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 8debea0
Revert changes to disable kv caching for encoder-only models
maxdebayser 8d97b9c
Add pooling support in v1
maxdebayser d60b22b
First end-to-end working version of Bert embeddings in V1
maxdebayser 6bebbb8
Support warmup for pooling models in V1
maxdebayser 6dafd71
address review comments
maxdebayser e2724a2
address review comments
maxdebayser 56ff6cd
remove debug prints
maxdebayser fc57edd
address review comments
maxdebayser 64a0e62
Fix cross encoder models in V1 and enable tests for pooling models
maxdebayser 4014d41
address review comments
maxdebayser 87a95a8
Merge branch 'main' into v1_embeddings
maxdebayser 902c129
address review comments
maxdebayser 2c68855
re-enable large embedding models
maxdebayser 8afd8f5
address review comments
maxdebayser 7762976
Merge branch 'main' into v1_embeddings
maxdebayser d7537ae
Merge branch 'upstream_main' into v1_embeddings
maxdebayser a9e7747
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 17520bd
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 90c611a
Merge branch 'upstream_main' into v1_embeddings
maxdebayser dec2441
Merge branch 'upstream_main' into v1_embeddings
maxdebayser a5e83f4
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 187f69b
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 69a0332
Merge branch 'upstream_main' into v1_embeddings
maxdebayser a9f1721
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 4b066a3
fix merge problems
maxdebayser 43a26dc
Merge branch 'upstream_main' into v1_embeddings
maxdebayser ca34513
Merge branch 'upstream_main' into v1_embeddings
maxdebayser bf3033d
Fix missing qwen embedding model param
maxdebayser 67bf727
Make pooling params reach the pooling in V1
maxdebayser 93b6361
Merge branch 'upstream_main' into v1_embeddings
maxdebayser d916b88
Merge branch 'upstream_main' into v1_embeddings
maxdebayser bad4211
fix merge problems
maxdebayser 35d9bd9
Merge branch 'upstream_main' into v1_embeddings
maxdebayser dcc6100
Merge branch 'upstream_main' into v1_embeddings
maxdebayser a4f85b5
Merge branch 'upstream_main' into v1_embeddings
maxdebayser a5f328a
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 7c5be88
fix merge problem
maxdebayser 29b75c9
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 6aa204c
backport changes from the other PR
maxdebayser e81470c
fix merge errors
maxdebayser 20e7140
address review comments
maxdebayser 6bc1e3d
address review comments
maxdebayser 22825bd
simplify PR
maxdebayser c889b2e
fix mistake
maxdebayser 24462e4
workaround qwen model test issue
maxdebayser b5f21f2
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 79d1b95
revert unecessary change
maxdebayser b3a0491
remove duplicated code
maxdebayser b4ab556
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 1a82e56
remove encoder model support to simplify PR
maxdebayser a66801b
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 660dd9c
fix several tests
maxdebayser 808c996
Merge branch 'upstream_main' into v1_embeddings
maxdebayser cdd70c9
Fix test
maxdebayser 0832115
disable bert test
maxdebayser 10bbf74
fix tests
maxdebayser ee892aa
limit context length to fit test GPU
maxdebayser 2e12eba
limit context length to fit test GPU
maxdebayser 14fcf24
fix test
maxdebayser 0624435
fix test
maxdebayser 706fdb2
Merge branch 'main' into v1_embeddings
22quinn 051f6d4
Fix _construct_cached_request_state
22quinn 214cf06
Fix v1 tests
22quinn 8193bd0
Merge pull request #1 from 22quinn/v1_embeddings
maxdebayser 65b8377
fix test
maxdebayser 33d7f74
Merge branch 'v1_embeddings' of github.com:maxdebayser/vllm into v1_e…
maxdebayser 4ee822a
reduce max_model_len to fit in test gpu
maxdebayser 7242731
fix test
maxdebayser a4f460b
fix test
maxdebayser 35ca640
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 17f6177
fix test
maxdebayser 3f0d42e
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 74d73cc
use torch.split
maxdebayser e6a66dc
enable cuda graphs
maxdebayser 4cca774
fix unecessary config.py changes
maxdebayser 8ef1982
fix error message
maxdebayser 28d00d1
remove unused import
maxdebayser e634f60
fix docstring
maxdebayser 053475c
revert unnecessary code changes
maxdebayser 6228f64
remove debug prints
maxdebayser 42c802a
fix refactoring bug
maxdebayser f771a19
fix refactoring bug
maxdebayser 02c47ad
Fix default chunked prefill for pooling models
maxdebayser 1fd252c
Merge branch 'upstream_main' into v1_embeddings
maxdebayser c5c0d97
Revert handling of case that can never happen
maxdebayser acfc9cc
fix small bug
maxdebayser 225b808
fix small bugs
maxdebayser 2b86c13
fix silly mistake
maxdebayser 2983252
reduce memory usage for small ci gpus
maxdebayser 58c556d
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 878d56a
enable chunked prefill by default for models that support it
maxdebayser 2db273f
Merge branch 'upstream_main' into v1_embeddings
maxdebayser 114af27
Merge branch 'upstream_main' into v1_embeddings
maxdebayser bc0219d
address review comments
maxdebayser 221f013
Merge branch 'upstream_main' into v1_embeddings
maxdebayser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,7 @@ def _run_incremental_decode(tokenizer, | |
None, | ||
params, | ||
None, | ||
None, | ||
0.0, | ||
None, | ||
cache_salt=None, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.