-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pod webhook + reconciler for new "leader pod" exclusive job placement strategy #309
Add pod webhook + reconciler for new "leader pod" exclusive job placement strategy #309
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4b21dc0
to
4151402
Compare
af40366
to
64e4c75
Compare
329f1b7
to
0150620
Compare
a76cbd1
to
db2c6ec
Compare
/retest |
// if Suspend is set, then we assume all jobs will be suspended also. | ||
jobsetSuspended := js.Spec.Suspend != nil && *js.Spec.Suspend | ||
job.Spec.Suspend = ptr.To(jobsetSuspended) | ||
|
||
return job, nil | ||
} | ||
|
||
// Appends pod affinity/anti-affinity terms to the job pod template spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember a discussion with @vsoch around these annotations. Should we implement the new way of placements in case of users using these annotations rather than deleting this entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the context of this PR, but I think a feature of jobset definitely should be to make it easy to ask for common configurations for affinity (or other similar needs for placement). In my case, I did try using one of the defaults here, but it was slightly off for what I needed (so I couldn't use this strategy) and had to implement an entire function akin to the below, with tweaks. I would say it would make sense to:
- Define a list of common use cases for affinity / anti-affinity
- Associate them with names the user can ask for in a section of the spec
- Implement so it's ridiculously easy to just request a named set!
And in this case, it would make sense to only deprecate the existing when an alternative is available. It might be unlikely that someone is using them, in which case maybe OK to delete first, but perhaps adding the new sets first makes sense so if anyone is using them they aren't really angry :)
76860eb
to
278d01a
Compare
You can use |
feed6dd
to
1317dbd
Compare
8860b76
to
3c08157
Compare
/retest |
3c08157
to
f09783e
Compare
7d61dba
to
2f474b0
Compare
return fmt.Sprintf("%s-%s-%d", jsName, rjobName, jobIndex) | ||
} | ||
|
||
func IsLeaderPod(pod *corev1.Pod) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util/shared sounds repetitive, shared is just another way of saying util.
Do we really need this function here? We probably only need it in the webhook, so lets just define it there and keep this pkg named "names"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up needing to use this function in the pod reconciler, to check if a follower pod's nodeSelector topology value does not match the leader pod's (and check if the leader pod even exists / was killed), so that's why I placed it in this shared package. This logic was removed from this PR to simplify reviewing, but it will be in a subsequent PR.
Yeah naming this package is hard, but I do think we do need a package of shared functions used by both webhooks and controllers, to serve as a single source of truth and avoid code duplication of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
util by definition is a pkg of "shared" functions, may be rename the pkg to placement
to indicate that those functions are for exclusive placement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do that in the followup PR
2fe62ec
to
7ef0e1d
Compare
/lgtm |
Fixes #310
This PR implements the 2nd option described in the linked issue. Specifically, it adds a pod mutating and admission webhook which does the following:
The pod reconciler is minimal in this PR, and only has the bare minimum logic needed to setup the indexes used by the webhook to look up pods by name and look up all pods for a job.
Once this PR is submitted, I'll create a follow-up PR with the pod reconciler changes necessary to handle race conditions that can arise in this webhook based exclusive placement strategy.