Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4443: Add more granular Job failure reason for PodFailurePolicy #4479

Merged
merged 21 commits into from
Jun 11, 2024

Conversation

danielvegamyhre
Copy link
Member

  • One-line PR description: Configurable Job failure reason for PodFailurePolicy
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 5, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 5, 2024
@danielvegamyhre
Copy link
Member Author

@soltysh for sig-apps lead review
tagging @ahg-g @alculquicondor @kannon92 for review as well

@danielvegamyhre
Copy link
Member Author

@deads2k would you be able to do an API review for this? Here we are proposing adding an optional Reason field to the PodFailurePolicyRule for the Job API, similar to the Reason field I proposed here in the Job success policy KEP.

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a prod-readiness file for alpha I believe.
keps/sig-apps/prod-readiness/4443.yaml

@danielvegamyhre danielvegamyhre force-pushed the kep-4443 branch 3 times, most recently from de03300 to ac89cc7 Compare February 6, 2024 18:39
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 6, 2024
@danielvegamyhre
Copy link
Member Author

@wojtek-t would you mind doing a PRR review for this KEP?

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 4, 2024
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some missing bits that needs to be filled in before we can accept this KEP.

keps/prod-readiness/sig-apps/4443.yaml Outdated Show resolved Hide resolved
## Summary

This KEP proposes to extend the Job API by adding an optional `Name` field to `PodFailurePolicyRule`. If unset, it would default to the index of
the rule in the `podFailurePolicy.rules` slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodFailurePolicy reason was rejected quite some time ago, as potentially overbloating condition's reason field. Index or the short name, which would be validated to fit in the condition's reason was preferred.

CRI or CNI may require updating that component before the kubelet.
-->

N/A. This feature doesn't require coordination between control plane components,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still holds.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying the default is the last bit to get this approved.

## Summary

This KEP proposes to extend the Job API by adding an optional `Name` field to `PodFailurePolicyRule`. If unset, it would default to the index of
the rule in the `podFailurePolicy.rules` slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above although readable to human is hard to parse. The idea is to be able to use the reason being in your example PodFailurePolicy_0 and parse it in higher level controller.

## Design Details

#### Defaulting
If unset, the `Name` field will default to the index of the pod failure policy rule in the `Rules` slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion we should not default, leave it empty, which is acceptable in which case the resulting Reason will contain the index. Iow. no defaulting happening, and that should be explicitly mentioned here.

@danielvegamyhre
Copy link
Member Author

Clarifying the default is the last bit to get this approved.

Updated the defaulting section to specify there will be no defaulting, and if Name is unset the Job controller will use the rule index in the condition reason suffix instead of the name.

@wojtek-t wojtek-t removed their assignment Jun 10, 2024
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point from Filip, but that's not blocking as long as it gets implemented.

/lgtm
/approve
from sig-apps pov

/hold
for prr review

/assign @jpbetz

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2024
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 10, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2024
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2024
Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
For PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielvegamyhre, jpbetz, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahg-g
Copy link
Member

ahg-g commented Jun 11, 2024

/hold cancel

Thanks a lot all for the great feedback, I think we got to a very good conclusion here! kudos to the community!

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2024
@k8s-ci-robot k8s-ci-robot merged commit 2366420 into kubernetes:master Jun 11, 2024
2 of 4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. wg/batch Categorizes an issue or PR as relevant to WG Batch.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.