Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed
47 of 76 tasks
zhuohan123 opened this issue Jun 25, 2023 · 16 comments
Closed
47 of 76 tasks

[Roadmap] vLLM Development Roadmap: H2 2023 #244

zhuohan123 opened this issue Jun 25, 2023 · 16 comments

Comments

@zhuohan123
Copy link
Member

zhuohan123 commented Jun 25, 2023

We summarize the issues we received and our planned features in this issue. This issue will keep being updated.

Latest issue tracked: #677

Software Quality

Installation

Documentation

New Models

Decoder-only models

Encoder-decoder models

Other techniques:

Frontend Features

vLLM demo frontends:

Integration with other frontends:

Engine Optimization and New Features

Kernels

Bugs

@zjc17
Copy link

zjc17 commented Jul 18, 2023

Is the Quantized models Supporting under developing?

@WaterKnight1998
Copy link

Is the Quantized models Supporting under developing?

This would be very helpfull @zhuohan123. Thank you very much for the state of the art performance in inference!

@Jwdev-wr
Copy link

Can we get function calling to match openai api feature on the roadmap? Not entirely sure what the implementation for that looks like, but it's a very useful feature.

@mondaychen
Copy link

I have a prototype implementation of OpenAI-like function calling. It works well on advanced models (like Llama 2). Please let me know if this is something the team would consider taking in as part of vllm.

@zhisbug
Copy link
Collaborator

zhisbug commented Aug 23, 2023

@mondaychen Yes, how about you submit a PR?

@mondaychen
Copy link

@zhisbug OK! I'll polish my prototype and submit a PR

@boxter007
Copy link

Need to support Baichuan2

@yeahjack
Copy link

yeahjack commented Sep 8, 2023

Here is an implementation of function calling with huggingface's models, could be helpful: https://local-llm-function-calling.readthedocs.io/en/latest/quickstart.html

@Xu-Chen
Copy link

Xu-Chen commented Sep 26, 2023

Need to support Qwen-14b

This was referenced Sep 28, 2023
@SinclairCoder
Copy link

Need to support Phi-1 and Phi-1.5

@xiaotiancd
Copy link

Possible to support CPU too?

@zhouyuan
Copy link
Contributor

zhouyuan commented Nov 2, 2023

Possible to support CPU too?
Hi @xiaotiancd Here's one draft patch to support CPU based infer, in case you are interested.
#1028

-yuan

@usaxena-asapp
Copy link

Hey @zhouyuan @WoosukKwon, I'd like to get this new variant of concurrent-LORA serving added to the roadmap:

concurrent LORA serving:

@jens-create
Copy link

Is there any plans to support functions like OpenAI? I know this task is complex as the parsing of llm output will be custom for each fine-tuned model depending on the training data. However, perhaps it would be possible to add a module/function that you can inject into api_server.py that maps the output of the llm (output.text) to a ChatMessage.

For example functionary has copied some of vllm and extended/customised it to support functions

In the future, when hopefully, more open-source models with function calling capabilities are released, it would be great if one does not have to clone a repository for each model but instead if the particular parsing was supported by vllm.

What thoughts are there on this matter? I wouldn't mind contributing to such a feature...

@OleksandrKorovii
Copy link

Is there any plans to support functions like OpenAI? I know this task is complex as the parsing of llm output will be custom for each fine-tuned model depending on the training data. However, perhaps it would be possible to add a module/function that you can inject into api_server.py that maps the output of the llm (output.text) to a ChatMessage.

For example functionary has copied some of vllm and extended/customised it to support functions

In the future, when hopefully, more open-source models with function calling capabilities are released, it would be great if one does not have to clone a repository for each model but instead if the particular parsing was supported by vllm.

What thoughts are there on this matter? I wouldn't mind contributing to such a feature...

Also interesting in this question

@simon-mo simon-mo unpinned this issue Jan 26, 2024
@zhuohan123 zhuohan123 changed the title vLLM Development Roadmap [Deprecated] vLLM Development Roadmap Jan 31, 2024
@zhuohan123
Copy link
Member Author

We have deprecated this roadmap. Please find our latest roadmap in #2681

yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
@simon-mo simon-mo changed the title [Deprecated] vLLM Development Roadmap [Roadmap] vLLM Development Roadmap: H2 2023 Oct 1, 2024
mht-sharma pushed a commit to mht-sharma/vllm that referenced this issue Oct 30, 2024
Removed duplicated lines and suppressed the most noisy warning that hides the actual important ones
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests