MultiKueue: Support sequential attempts to try worker clusters #3757

mimowo · 2024-12-06T15:28:34Z

What would you like to be added:

We would like to try sequentially the worker clusters, not all of them at the same time. The attempts could be time-based.

This will require at least API for controlling the time between the attempts. Also, the question -should the timeout be global, per manager, or per worker. Needs to be designed.

Why is this needed:

To avoid the risk of admitting the same workload on two clusters at the same time, and thus possibly doing preemptions on both clusters
To prioritize the use of some clusters over others. For example a user may have one cluster with reservations, and one auto-scaled. The user prefers to first try the reservation cluster, and only as a fallback try autoscaling.
To avoid autoscaling on multiple worker clusters at the same time.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

mimowo · 2024-12-06T15:28:57Z

cc @mwielgus @mwysokin @tenzen-y

mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiKueue: Support sequential attempts to try worker clusters #3757

MultiKueue: Support sequential attempts to try worker clusters #3757

mimowo commented Dec 6, 2024 •

edited

Loading

mimowo commented Dec 6, 2024

MultiKueue: Support sequential attempts to try worker clusters #3757

MultiKueue: Support sequential attempts to try worker clusters #3757

Comments

mimowo commented Dec 6, 2024 • edited Loading

mimowo commented Dec 6, 2024

mimowo commented Dec 6, 2024 •

edited

Loading