MVP support for serving workloads running as LeaderWorkerSet #3232

mimowo · 2024-10-15T06:50:04Z

What would you like to be added:

MVP support for LeaderWorkerSet in Kueue. It does not need to be ideal, but we want to have some support to unblock users and collect users' feedback.

The idea is to base the support on StatefulSets, so the integration would also use Pod Groups, similarly as for regular StatefulSets. Each LeaderWorkerGroup creates a new Pod Group. I a single pod group we will have:

Leader pod, controller by Leader’s STS
Worker pods, controller by unique, dedicated STS

The size of the group will be taken from LeaderWorkerSet.Spec.LeaderWorkerTemplate.Size and increased by 1 (to include the leader).

This is a follow up to #2717.

Why is this needed:

We want to support serving primitives in Kueue as there is an increasing demand among users to run clusters mixing AI training and inference who want to manage the expensive GPU resources.

LeaderWorkerSet is a new serving API which is gaining popularity as a primitive to host AI/ML inference.

mimowo · 2024-10-15T06:50:13Z

/assign @vladikkuzn

mimowo · 2024-10-15T06:50:22Z

/cc @mwielgus @tenzen-y

mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 15, 2024

k8s-ci-robot assigned vladikkuzn Oct 15, 2024

This was referenced Oct 15, 2024

MVP support / extension of support for serving workloads #2717

Open

☂️ Release v0.9.0 requirements #3192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP support for serving workloads running as LeaderWorkerSet #3232

MVP support for serving workloads running as LeaderWorkerSet #3232

mimowo commented Oct 15, 2024

mimowo commented Oct 15, 2024

mimowo commented Oct 15, 2024

MVP support for serving workloads running as LeaderWorkerSet #3232

MVP support for serving workloads running as LeaderWorkerSet #3232

Comments

mimowo commented Oct 15, 2024

mimowo commented Oct 15, 2024

mimowo commented Oct 15, 2024