Support multi-node inferencing using LWS

**Is your feature request related to a problem? Please describe.**

ModelService currently supports disaggregated prefill and decode deployments, including tensor-parallel setups across multiple GPUs on a single node. However, to enable multi-node vLLM inference, we require integration with the [leaderworkerset](https://github.com/kubernetes-sigs/lws) controller.

For example, consider the following use-case: I want to run vLLM with tensor_parallel_size=8, distributed across 2 nodes with 8 GPUs each. For this use-case, I would like ModelService to support orchestrating a multi-pod workload under a LeaderWorkerSet, where each pod (worker) forms part of a single logical model instance.

**Describe the solution approach you'd like**

We need native support in ModelService controller for:

Creating and managing LeaderWorkerSet CRs.

Propagating relevant vLLM CLI/env args to each pod based on its role (leader vs worker).

Handling lifecycle and readiness conditions for a distributed setup.

Reusing existing prefill/decode fields as much as possible, perhaps adding a new field like useLeaderWorkerSet: true.

Complete the implementation of the `parallelism` stanza in ModelService (including node parallelism)


**Describe alternatives you've considered**

Manually creating LeaderWorkerSet CRs and bypassing ModelService, which breaks the declarative, unified model serving approach.

**Additional context**
See:

https://github.com/kubernetes-sigs/lws

https://docs.vllm.ai/en/latest/deployment/frameworks/lws.html

The goal is to make ModelService a first-class abstraction for both single-node and multi-node vLLM deployments.

📌 Please read the linked references carefully before implementing this feature.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multi-node inferencing using LWS #208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support multi-node inferencing using LWS #208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions