-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] vLLM Roadmap Q1 2024 #2681
Comments
Is it possible to support mlx for running inference on Mac devices? That would simplify the local development and running on cloud. |
As mentioned in #2643, it would be awesome to have vLLM |
please take attention with "Evaluation of Accelerated and Non-Accelerated Large Model Output",it is very important and make sure they are always same |
Agree 100%, the ability to use |
#2767 I suggest adding this to the roadmap as it's one of the more straight forward optimizations (someone already did the optimization work). |
#2573 |
Please support ARM aarch-64 architecture. |
please consider support streamingllm |
Any update for PEFT? please consider support huggingface peft, thank you. ##1129 |
Would you consider adding support for earlier ROCm versions, e.g. 5.6.1.? Thank you! |
If is possible exl2 support thank you <3 |
#97 should be added to the automating the release process section |
Also, the ability to use Guidance/Outlines via logit_bias! |
support W8A8 |
Let's migrate out discussion to #3861 |
This document includes the features in vLLM's roadmap for Q1 2024. Please feel free to discuss and contribute to the specific features at related RFC/Issues/PRs and add anything else you'd like to talk about in this issue.
In the future, we will publish our roadmap quarterly and deprecate our old roadmap (#244).
vLLM team is working with the following hardware vendors:
torch.compile
supportThe text was updated successfully, but these errors were encountered: