Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption admission check controller. KEP update. #1178

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 36 additions & 1 deletion keps/993-two-phase-admission/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Preemption Admission Check Controller](#preemption-admission-check-controller)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit Tests](#unit-tests)
Expand Down Expand Up @@ -206,7 +207,7 @@ pass AND there are some AdmissionChecks configured, AND the
workload is not in on-hold retry state from some check, it will:

1. Fill the Admission field in workload, with the desired flavor assignment.
2. Not do any preemptions yet (unless BookCapacity is set to true).
2. Not do any preemptions yet (unless the check uses `Anytime` preemption policy).
3. Set "QuotaReserved" to true.

Kueue will only pass as many pods into "QuotaReserved" as there would
Expand Down Expand Up @@ -253,6 +254,40 @@ The controller implementing a particular check should:
* After approving the workload, keep an eye on the check if it starts failing,
fail the check and cause the workload to move back to the suspended state.

### Preemption Admission Check Controller
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved

In this proposal, the time to evict the preemption candidates varies based on the preemptor state
trasc marked this conversation as resolved.
Show resolved Hide resolved
the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check
trasc marked this conversation as resolved.
Show resolved Hide resolved
that is manged by a new built-in admission check controller.

The **Preemption Admission Check Controller** will:

- Watch for a change in state of the workloads pending preemption.
- Watch for workloads that are finishing execution and therefore releasing quota.
- Watch for AdmissionCheck changes, since the preemption policy can change.

The preemption controller uses the kueue cache, since it needs to check the state of workloads admitted to the ClusterQueues.

At every run the controller will get the list of workloads pending preemption.

The workloads pending preemption are divided into:
- `preemtingLetaer` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`.
trasc marked this conversation as resolved.
Show resolved Hide resolved
- `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle.

Then:
1. Remove all workloads from the snapshot.
2. For every workload in `preemtingNow` , in the order of their priority:
- If it can fit without the need of evicting other workloads
- Set its admission check to `Ready`
- Add it to the snapshot
- If it cannot fit
- Get an updated list of eviction candidates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify which ones are candidates? Is it only workloads that are fully Admitted? I don't think you want to consider ones that only have ResourceQuota.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The candidates are picked from the list of workloads that hold a quota reservation, just because maybe a admission check of a workload is transitioning slowly to ready dose not mean that it should be able to hold its quota against a higher priority workload.

- If the updated list is not empty.
- Issue the eviction to the candidates.
- Add it to the snapshot
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
- If the updated list is empty, meaning the preemption cannot be done.
- Set its admission check to `Ready`
trasc marked this conversation as resolved.
Show resolved Hide resolved

### Test Plan

[ x ] I/we understand the owners of the involved components may require updates to
Expand Down