-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Bugfix][Perf] Introduce MQLLMEngine
to avoid asyncio
OH
#8157
Merged
robertgshaw2-redhat
merged 130 commits into
vllm-project:main
from
neuralmagic:reduce-asyncio-oh-alex
Sep 18, 2024
Merged
Changes from 1 commit
Commits
Show all changes
130 commits
Select commit
Hold shift + click to select a range
a7a6e43
[Benchmark] Add async throughput benchmark
njhill ce7d159
wip
njhill 569cd43
Merge remote-tracking branch 'njhill/async-llm-eng-bench' into reduce…
robertgshaw2-redhat d99ce6f
stash
robertgshaw2-redhat 8d6b2e9
remove proxy
robertgshaw2-redhat 14f3637
stash
robertgshaw2-redhat 3b8311b
added mp_llm_engine
robertgshaw2-redhat 5e2eb74
fixed
robertgshaw2-redhat aa62f2e
format
robertgshaw2-redhat 863081b
cleanup
robertgshaw2-redhat 965b97a
revert asyncllmengine
robertgshaw2-redhat 8fd72f6
fix nit
robertgshaw2-redhat ddeb7c6
format
robertgshaw2-redhat 6539e10
Merge branch 'main' into reduce-asyncio-oh
robertgshaw2-redhat 4b111e4
clean
robertgshaw2-redhat a5ffd2c
fix
robertgshaw2-redhat 1395872
stash
robertgshaw2-redhat 938cf85
move files
robertgshaw2-redhat 72d1d42
cleanup code
robertgshaw2-redhat fcdcfc9
refactor, cleanup
robertgshaw2-redhat 659169e
updated
robertgshaw2-redhat 9886f3d
make health check work
robertgshaw2-redhat 5b2f057
format
robertgshaw2-redhat ae4564c
awk -> ack
robertgshaw2-redhat f9ccecc
add better shutdown
robertgshaw2-redhat 89b730b
cleanup comment
robertgshaw2-redhat f3dc82b
more awk --> ack
robertgshaw2-redhat ac97a9e
use constant
robertgshaw2-redhat becd7ab
format
robertgshaw2-redhat b7f49ed
remove set to None
robertgshaw2-redhat 58ae3b0
Merge remote-tracking branch 'origin/main' into reduce-asyncio-oh
njhill d0f9641
Remove redundant pass
njhill aa64042
Merge branch 'main' into reduce-asyncio-oh
robertgshaw2-redhat 5c6e5ef
review comments
alexm-redhat 25174a5
format
alexm-redhat db55c1a
add async socket reads and socket writes
alexm-redhat f97e1f2
Some error handling
njhill dd96d3e
remove async benchmark
robertgshaw2-redhat 14d4afe
stash
robertgshaw2-redhat bc386ea
Merge branch 'main' into reduce-asyncio-oh-alex
robertgshaw2-redhat c0d0d60
adding error handling
robertgshaw2-redhat b7c1fcc
error handling
robertgshaw2-redhat a661b76
added
robertgshaw2-redhat 5d00f3a
formatting in place
robertgshaw2-redhat 5598494
added error handling
robertgshaw2-redhat 98aaa7d
change name
robertgshaw2-redhat ba5ef38
change name
robertgshaw2-redhat 18b5a94
added dead_error to asyncengine
robertgshaw2-redhat b048961
moved tests under openai
robertgshaw2-redhat 6b2e18b
updated tests
robertgshaw2-redhat 7a7ff5b
revert executor change
robertgshaw2-redhat b7e1fe9
revert
robertgshaw2-redhat 48068d5
executor class
robertgshaw2-redhat e3daa28
cleanup format
robertgshaw2-redhat 7880b75
format
robertgshaw2-redhat 29fe3c8
shorten
robertgshaw2-redhat a720947
Revert change
robertgshaw2-redhat 5b8cee6
enable shutdown for tp>1
robertgshaw2-redhat 97a241d
format
robertgshaw2-redhat 6d0570e
added error handling
robertgshaw2-redhat eb26791
format
robertgshaw2-redhat e256050
try out hwm
robertgshaw2-redhat 59c5aca
Add stop_remote_worker_execution_loop for TP case
njhill 62f654a
Revert unnecessary stop_remote_worker_execution_loop
njhill 75c6157
fixed magicmock errored
robertgshaw2-redhat 6f1cced
Merge branch 'main' into reduce-asyncio-oh-alex
robertgshaw2-redhat 370c104
fall back to asyncllmengine if pp
robertgshaw2-redhat 0cf9551
formatting
robertgshaw2-redhat 72f72fd
stash
robertgshaw2-redhat ded4540
Merge branch 'main' into reduce-asyncio-oh-alex
robertgshaw2-redhat 364ed7f
remove DO_LOG_STATS RPC call
robertgshaw2-redhat f7fdf69
cleanup health check
robertgshaw2-redhat 7e61cdb
Use pickle for requests too
njhill 3e84c8c
Remove hwm
robertgshaw2-redhat 2559813
Simplify configs setup
njhill d0a0f8b
stash
robertgshaw2-redhat 70e4916
Merge branch 'reduce-asyncio-oh-alex' of https://github.com/neuralmag…
robertgshaw2-redhat 021fed3
added tests
robertgshaw2-redhat fd6ee43
added failed health check
robertgshaw2-redhat ccb43a3
rename
robertgshaw2-redhat 1aa0823
added failed abort test
robertgshaw2-redhat fe22fe2
formatting
robertgshaw2-redhat 3ce8702
Some more startup RPC simplification
njhill 1f3fc24
fix yapf conflict
njhill ead62dd
fix entrypoints tests
alexm-redhat 672fb81
stash
robertgshaw2-redhat 86312e4
fix Intel/TPU tests
alexm-redhat c4f6898
Merge branch 'reduce-asyncio-oh-alex' of https://github.com/neuralmag…
robertgshaw2-redhat 678e8e5
Merge branch 'reduce-asyncio-oh-alex' of https://github.com/neuralmag…
robertgshaw2-redhat 78b9e21
fix
robertgshaw2-redhat 66c6961
formatting
robertgshaw2-redhat 6e1e2bb
cleanup
robertgshaw2-redhat 610b349
cleanup
robertgshaw2-redhat 28bb8a4
format
robertgshaw2-redhat b266249
fix poller
robertgshaw2-redhat f8036a5
add graceful shutdown on abort after client closed
robertgshaw2-redhat a649f75
cleanup formatting
robertgshaw2-redhat 5b3535d
added test abort
robertgshaw2-redhat 7097e05
fix up tests
robertgshaw2-redhat ad3d0f8
added abort tests
robertgshaw2-redhat 6e9c6c9
added another accurayc test
robertgshaw2-redhat fb8e2f9
add multistep test for accuracy of mq llm engine
robertgshaw2-redhat 75523b2
added test genertion
robertgshaw2-redhat 5546d2e
fixed accuracy test launch
robertgshaw2-redhat 6403f49
added load test
robertgshaw2-redhat bc68b51
Merge branch 'main' into reduce-asyncio-oh-alex
robertgshaw2-redhat 3bb5e52
remove file
robertgshaw2-redhat 2ac814f
format
robertgshaw2-redhat 179a667
added load test
robertgshaw2-redhat 97d6c09
format
robertgshaw2-redhat 78badc1
added load test
robertgshaw2-redhat a499733
format
alexm-redhat 6a5d8d8
stash
robertgshaw2-redhat dfab5eb
Merge branch 'reduce-asyncio-oh-alex' of https://github.com/neuralmag…
robertgshaw2-redhat 96f84fe
format
robertgshaw2-redhat ae14670
Merge branch 'main' into reduce-asyncio-oh-alex
robertgshaw2-redhat 117c024
format
robertgshaw2-redhat c059713
remove debug print
robertgshaw2-redhat 1af3297
removed stray
robertgshaw2-redhat 97ae38d
updated
robertgshaw2-redhat d0fab11
switch model to avoid OOM in TPU test
robertgshaw2-redhat bb4d839
Merge remote-tracking branch 'origin/main' into reduce-asyncio-oh-alex
njhill 1967f6a
Adjust timeouts
njhill a911323
stahs
robertgshaw2-redhat 95ff4f3
make timeout 10000 ms
robertgshaw2-redhat 302868e
format
robertgshaw2-redhat add68ee
Update examples/openai_chat_completion_client.py
robertgshaw2-redhat 242b952
adjust RPC timeout on TPU
robertgshaw2-redhat 3dafa26
add longer delay for check ehalth
robertgshaw2-redhat 836a9d2
Update client.py
robertgshaw2-redhat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
add longer delay for check ehalth
- Loading branch information
commit 3dafa26d830b1c339f84f93c61a25c0f89f7305a
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -365,6 +365,7 @@ async def check_health(self): | |
Engine's health every N seconds and sets _errored_with | ||
if the engine is unhealthy. | ||
""" | ||
print(self._errored_with) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Left over |
||
if self._errored_with is not None: | ||
raise self._errored_with | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid the long sleep here by setting the healthcheck interval to be much shorter during these tests?
Or restructure so it passes as soon as the health check fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, but I dont want to re-run the whole CI so you can post a PR as a follow up if you want