Skip to content
This repository was archived by the owner on Jul 24, 2025. It is now read-only.
This repository was archived by the owner on Jul 24, 2025. It is now read-only.

Support multi-node inferencing using LWS #208

@sriumcp

Description

@sriumcp

Is your feature request related to a problem? Please describe.

ModelService currently supports disaggregated prefill and decode deployments, including tensor-parallel setups across multiple GPUs on a single node. However, to enable multi-node vLLM inference, we require integration with the leaderworkerset controller.

For example, consider the following use-case: I want to run vLLM with tensor_parallel_size=8, distributed across 2 nodes with 8 GPUs each. For this use-case, I would like ModelService to support orchestrating a multi-pod workload under a LeaderWorkerSet, where each pod (worker) forms part of a single logical model instance.

Describe the solution approach you'd like

We need native support in ModelService controller for:

Creating and managing LeaderWorkerSet CRs.

Propagating relevant vLLM CLI/env args to each pod based on its role (leader vs worker).

Handling lifecycle and readiness conditions for a distributed setup.

Reusing existing prefill/decode fields as much as possible, perhaps adding a new field like useLeaderWorkerSet: true.

Complete the implementation of the parallelism stanza in ModelService (including node parallelism)

Describe alternatives you've considered

Manually creating LeaderWorkerSet CRs and bypassing ModelService, which breaks the declarative, unified model serving approach.

Additional context
See:

https://github.com/kubernetes-sigs/lws

https://docs.vllm.ai/en/latest/deployment/frameworks/lws.html

The goal is to make ModelService a first-class abstraction for both single-node and multi-node vLLM deployments.

📌 Please read the linked references carefully before implementing this feature.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions