Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed
5 of 30 tasks
zhuohan123 opened this issue Jan 31, 2024 · 15 comments
Closed
5 of 30 tasks

[Roadmap] vLLM Roadmap Q1 2024 #2681

zhuohan123 opened this issue Jan 31, 2024 · 15 comments

Comments

@zhuohan123
Copy link
Member

zhuohan123 commented Jan 31, 2024

This document includes the features in vLLM's roadmap for Q1 2024. Please feel free to discuss and contribute to the specific features at related RFC/Issues/PRs and add anything else you'd like to talk about in this issue.

In the future, we will publish our roadmap quarterly and deprecate our old roadmap (#244).

@sandangel
Copy link

Is it possible to support mlx for running inference on Mac devices? That would simplify the local development and running on cloud.

@AguirreNicolas
Copy link
Contributor

As mentioned in #2643, it would be awesome to have vLLM /completions & /chat/completions endpoints both supporting logprobs to run lm-eval-harness.

@PeterXiaTian
Copy link

please take attention with "Evaluation of Accelerated and Non-Accelerated Large Model Output",it is very important and make sure they are always same

@jrruethe
Copy link

jrruethe commented Feb 5, 2024

As mentioned in #2643, it would be awesome to have vLLM /completions & /chat/completions endpoints both supporting logprobs to run lm-eval-harness.

Agree 100%, the ability to use lm-eval-harness is very much needed

@casper-hansen
Copy link
Contributor

#2767 I suggest adding this to the roadmap as it's one of the more straight forward optimizations (someone already did the optimization work).

@jalotra
Copy link

jalotra commented Feb 8, 2024

#2573
this talks about optimising api server Optimize the performance of the API server

@cyc00518
Copy link

Please support ARM aarch-64 architecture.

@Tint0ri
Copy link

Tint0ri commented Feb 28, 2024

#1253

please consider support streamingllm

@kanseaveg
Copy link

Any update for PEFT?

please consider support huggingface peft, thank you. ##1129

@ekazakos
Copy link

ekazakos commented Mar 4, 2024

Would you consider adding support for earlier ROCm versions, e.g. 5.6.1.? Thank you!

@pabl-o-ce
Copy link

If is possible exl2 support thank you <3

@hmellor
Copy link
Collaborator

hmellor commented Mar 8, 2024

#97 should be added to the automating the release process section

@jrruethe
Copy link

jrruethe commented Mar 8, 2024

Also, the ability to use Guidance/Outlines via logit_bias!
And +1 to EXL2 support

@busishengui
Copy link

support W8A8

@simon-mo
Copy link
Collaborator

simon-mo commented Apr 4, 2024

Let's migrate out discussion to #3861

@simon-mo simon-mo closed this as completed Apr 4, 2024
@simon-mo simon-mo unpinned this issue Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests