Open
Description
This document includes a list of issues / feature requests that we have collected across the oss and other channels. We’ll update this list with relevant info from issues, etc as we go. If there are any features that are not prioritized here, please feel free to open an RFC or feature request or post on the slack community channel. Follow this form for joining slack: https://www.ray.io/join-slack
Core features
Serve
- [P0] Prefix aware router @jujipotle -- [serve.llm] Prefix-aware scheduler [2/N] Configure PrefixAwareReplicaScheduler as default scheduler in LLMServer #52725
- [P0] In-place update for deployments when you have new models without having re-deploy the cluster
- [P0] Display vllm emitted metrics: https://github.com/ray-project/ray/pull/51156/files @eicherseiji
- [P1] TPU support
- [P1] Open router protocol api for devs / researchers [RFC] [Serve] Custom Request Router #53016
- [P1] Prefill disaggregation PxDy pattern (many open questions around this architecture so would be interesting to see under what conditions is this architecture better than simple chunked-prefill enabled replicas under the same resource count) (see RFC [RFC][llm] Prefill/Decode disaggregation with Ray Serve #53257 )
- [P1] Distributed kv cache -- Using kv-connector in vllm -- Need to demonstrated this next Q
- [P1] Embedding models endpoints: [llm] Embedding api #52229 @janimo
- [P1] Heterogenous accelerator_type (Have a single deployment that can be scheduled with different engine settings on different accelerator types and shapes with different priorities)
- [P2] More backends other than vLLM (e.g. sglang)
- [P2] Fractional gpu support
Data
- [P0] Multi-node TP+PP for large DeepSeek models @lk-chen
- [P1] More vision language models
- [P1] TPU support
- [P2] More backends other than vLLM (e.g. sglang) [data.llm] Add support for sglang for processor and stage. #51409 cc @Qiaolin-Yu
- [P2] Heterogenous accelerator_type in the same pipeline
CI/CD and release pipeline
- [P0] Release tests for structured output (llm.data) @lk-chen
- [P0] For Serve release tests use gen-config on the critical path
Docs and community support
- [P0] Cover gen-config in serve docs
- [P0] Run doc-test on examples @lk-chen
- [P0] Update vllm docs with ray cluster setup guide and serve and data code examples
- [P1] Example of running deepseek R1 (huge model with ray serve multi node)