[llm] Roadmap for Data and Serve LLM APIs

This document includes a list of issues / feature requests that we have collected across the oss and other channels. We’ll update this list with relevant info from issues, etc as we go. If there are any features that are not prioritized here, please feel free to open an RFC or feature request or post on the [slack community channel](https://ray.slack.com/archives/C08H0M37WLQ). Follow this form for joining slack: https://www.ray.io/join-slack

Core features
--------------
### Serve
- [x] [P0] Prefix aware router @jujipotle  -- https://github.com/ray-project/ray/pull/52725
- [ ] [P0] In-place update for deployments when you have new models without having re-deploy the cluster
- [x] [P0] Display vllm emitted metrics: https://github.com/ray-project/ray/pull/51156/files @eicherseiji 
- [ ] [P1] TPU support 
- [x] [P1] Open router protocol api for devs / researchers https://github.com/ray-project/ray/issues/53016
- [x] [P1] Prefill disaggregation PxDy pattern (many open questions around this architecture so would be interesting to see under what conditions is this architecture better than simple chunked-prefill enabled replicas under the same resource count) (see RFC #53257 )
- [ ] [P1] Distributed kv cache -- Using kv-connector in vllm  -- Need to demonstrated this next Q
- [x] [P1] Embedding models endpoints: https://github.com/ray-project/ray/pull/52229 @janimo 
- [ ] [P1] Heterogenous accelerator_type (Have a single deployment that can be scheduled with different engine settings on different accelerator types and shapes with different priorities)
- [ ] [P2] More backends other than vLLM (e.g. sglang)
- [ ] [P2] Fractional gpu support

### Data

- [ ] [P0] Multi-node TP+PP for large DeepSeek models @lk-chen 
- [x] [P1] More vision language models
- [ ] [P1] TPU support
- [x] [P2] More backends other than vLLM (e.g. sglang) #51409 cc @Qiaolin-Yu 
- [ ] [P2] Heterogenous accelerator_type in the same pipeline

CI/CD and release pipeline
--------------------------
- [ ] [P0] Release tests for structured output (llm.data) @lk-chen 
- [ ] [P0] For Serve release tests use gen-config on the critical path

Docs and community support
--------------------------
- [x] [P0] Cover gen-config in serve docs
- [ ] [P0] Run doc-test on examples @lk-chen 
- [ ] [P0] Update vllm docs with ray cluster setup guide and serve and data code examples
- [ ] [P1] Example of running deepseek R1 (huge model with ray serve multi node)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm] Roadmap for Data and Serve LLM APIs #51313

Core features

Serve

Data

CI/CD and release pipeline

Docs and community support

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[llm] Roadmap for Data and Serve LLM APIs #51313

Description

Core features

Serve

Data

CI/CD and release pipeline

Docs and community support

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions