-
Notifications
You must be signed in to change notification settings - Fork 3
P/D Stable Branch #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
This was there originally but inadvertently dropped Signed-off-by: Nick Hill <nhill@redhat.com>
…#79) * Benchmark one concurrent req Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Updates Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * restore Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Improve random requests, switch up initial test Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com>
Requires version of LMCache with the corresponding changes Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Nick Hill <nhill@redhat.com>
[BugFix] Fix ordering of KVConnector finished send/rcv sets
The MultiKVConnector impl keeps track of cases where multiple connectors are async saving the same request, but this state needs to be shared from the scheduler side to the worker side. Signed-off-by: Nick Hill <nhill@redhat.com>
* [BugFix] Fix handling of num_computed_tokens with connector vllm-project#18001 changed the behaviour subtly and broke some multi-connector cases. This change ensures we don't call the connector get_num_new_matched_tokens method a second time for a given request after an async load has completed. Signed-off-by: Nick Hill <nhill@redhat.com> * fix linting Signed-off-by: Nick Hill <nhill@redhat.com> * handle full cache hit on P/D decode worker case Signed-off-by: Nick Hill <nhill@redhat.com> * fix comment wording Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> --------- Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
* updated Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> * cleanup issue Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> --------- Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
SUMMARY: