Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog about pod scheduling readiness #37436

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
1.26 blog: pod scheduling readiness to alpha
  • Loading branch information
Huang-Wei committed Dec 1, 2022
commit cc7a262894d60d37a059e93f19170183c0f24840
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: blog
title: "Kubernetes 1.26: Pod Scheduling Readiness to Alpha"
date: 2022-11-29
slug: pod-scheduling-readiness-alpha-1-26
---

**Author:** Wei Huang (Apple), Abdullah Gharaibeh (Google)

Kubernetes 1.26 introduced a new Pod feature named SchedulingGates. SchedulingGates are keys that
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved
tell the scheduler when a Pod is ready to be considered for scheduling.

## What problem does it solve?

When a Pod is created, the scheduler will continuously attempt to find a node that fits it. This
infinite loop continues until the scheduler either finds a node for the Pod, or the Pod gets deleted.

Pods that remain unschedulable for long periods of time (e.g., one that is blocked on some external event)
waste scheduling cycles. A scheduling cycle may take ~20ms or more depending on the complexity of
the Pod's scheduling constraints. Therefore, at scale, those wasted cycles significantly impact the scheduler's performance. See the cycle in red arrows below.

![](./scheduler-workflow-1.png)

SchedulingGates helps address this problem. It allows declaring that newly created Pods are not
ready for scheduling. When SchedulingGates are present on a Pod, the scheduler ignores the Pod
and therefore saves unnecessary scheduling attempts. Clearing the gates is the responsibility of
external controllers with knowledge of when the Pod should be considered for scheduling (e.g.,
a resource provisioner).

![](scheduler-workflow-2.png)

## How does it work?

SchedulingGates in general works very similar to Finalzers. Pods with non-empty
`spec.schedulingGates` field will carry the status `SchedulingGated` and will be blocked from
sftim marked this conversation as resolved.
Show resolved Hide resolved
scheduling. Note that more than one gate can be added, but they all should be added upon Pod
creation (e.g., you can add them as part of the spec of via a mutating webhook).

```
NAME READY STATUS RESTARTS AGE
test-pod 0/1 SchedulingGated 0 10s
```

To clear the gates, you update the Pod by removing the `schedulingGates` string literals.
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved
The gates do not need to be removed all at once, but only when all the gates are removed the
scheduler will start to consider the Pod for scheduling.

Under the hood, SchedulingGates is implemented as a PreEnqueue scheduler plugin, a new scheduler
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved
framework extension point that is invoked at the beginning of each scheduling cycle.

## Use Cases

An important use case this feature enables is dynamic quota management. Kubernetes supports
ResourceQuota, however quotas are enforced by the api-server on Pod creation. For example, if a
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved
new Pod exceeds the cpu quota, it gets rejected. The api-server doesn't queue the Pod; therefore,
whoever created the Pod needs to continuously attempt to recreate it again!
sftim marked this conversation as resolved.
Show resolved Hide resolved

SchedulingGates allows an external quota manager to address the above limitation of ResourceQuota.
Specifically, the manager could add a "quotaCheck" scheduling gate to all Pods created in the
cluster (using a mutating webhook). The manager would then remove the gate when there is quota to
start the Pod.
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved

sftim marked this conversation as resolved.
Show resolved Hide resolved
## Whats next?

To use this feature, the `PodSchedulingReadiness` feature gate must be enabled in the API Server
and Scheduler. You're more than welcome to test it out and tell us (sig-scheduling) what you think!

## Additional resources

- [Official PodSchedulingReadiness docs](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)
Huang-Wei marked this conversation as resolved.
Show resolved Hide resolved
- [Official Kubernetes Enhancement Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/3521-pod-scheduling-readiness/README.md)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.