New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[vLLM] reduce speculative decoding gpu util to leave room for draft model #1628

Merged

lanking520 merged 1 commit into deepjavalibrary:master from lanking520:awq

Mar 14, 2024

Contributor

lanking520 commented Mar 14, 2024

Description

Reduce the GPU memory util for draft model impact


          reduce speculative decoding gpu util to leave room for draft model

9473e1a

lanking520 requested review from zachgk, frankfliu and a team as code owners

March 14, 2024 05:30

rohithkrn approved these changes

View reviewed changes

lanking520 merged commit bcbe587 into deepjavalibrary:master

8 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet