-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit of termination-reason proposal #541
Conversation
@kubernetes/sig-apps-feature-requests |
ReasonEviction TerminationReason = "Eviction" | ||
// ReasonIntolerableTaint is the default reason used to communicate that a Pod | ||
// has been terminated due to a taint for which it has no declared toleration. | ||
ReasonIntolerableTaint = "IntolerableTaint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this list is mixing two subtly different pieces of information, like I think this is basically the same as an eviction? Also, it will be difficult to extend this without breaking clients.
What about switching to two (or more) lists.
TerminationIntention: {RestartInPlace, RestartElsewhere, NoRestart}
TerminationReason: {TaintViolation, BinPacking, Update, ...}
The intention list should basically never need to change. I am not sure it is a good idea to add the reason list at all. Could also consider making it a user-selectable string (i.e., not an enum).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TerminationIntention
enum seems to be promising since it eliminates the needs for applications to check various reasons and then decide whether/where the pods would be restarted; otherwise, applications will probably need to be updated to handle each newly added reasons in the future.
This value will not be in DeleteOptions since it's only specific to pod deletion and kubectl
could take this value as an enum, instead of a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TerminationIntention
seems to be all we need at this point. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things, perhaps the terminology is a bit overloaded.
- By eviction, I mean eviction as it is performed in the context of a drain. As in, we have created an eviction resource and are attempting to delete a Pod with respect to any application declared disruption budgets.
- Intolerable taints and pressure threshold violations (afiak) do not even attempt to respect eviction resources, but this is the mechanism that will be used to remove a StatefulSet's Pods from a node when local storage is implemented and the storage media fails.
The reason that I prefer one field, that is generic, and passes an arbitrary string value is that we will not foresee or enumerate all possible termination signals that a user might want to send to deal with different categories of distributed systems using (eventually) both local and remote storage. A generic string, imo, gives users the functionality they need without being over specific or attempting to constrain the signals to the ones that we can foresee right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage of TerminationIntention
is that users do not need to understand the consequence of each termination reason -- it clearly indicates what the pod's behavior would be, which is what the applications really need, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I don't disagree with an additional general reason string field, which could carry arbitrary human-readable text, at least for logging purpose.
} | ||
``` | ||
|
||
The ObjectMeta struct is modified to carry the termination reason via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this belong in metadata? Why not status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in ObjectMeta because its only used in conjunction with DeletionTimestamp during graceful deletion. However, there is no reason it could not be part of PodStatus if this makes the API cleaner or if this placement is more appropriate. If there is a strong feeling that PodStatus is a better location than we should move it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current design, the reason is passed via DeleteOptions
, which is not specific to pod, so it seems reasonable not to put it in PodStatus
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current design, the reason is passed via DeleteOptions, which is not specific to pod, so it seems reasonable not to put it in PodStatus.
But the TerminationReason is specific to Pods, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deletion timestamp and grace period seconds are in metadata because they're logically applicable to all (or at least multiple) objects, even if only implemented for pods. I am claiming that either it needs to be clear that TerminationReason is likewise logically applicable to more than just pods, or it does not belong in metadata. Metadata is shared by all objects, it should not contain any fields that are only meaningful for a single object type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as @liggitt suggests below we could image a world were a signal is sent to controllers and then propagated to Pods' handlers when the controller performs a deletion. If we use PodStatus, we are always constrained to Podsl. If we put it in ObjectMeta we have flexibility going forward. I'm more in favor of taking @thockin's suggestion and generalizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectMeta is forever. I'd prefer not to rush into a pod specific field on ObjectMeta without a lot of certainty. I am extremely hesitant to say this should be anything but an annotation until it clears a pretty high bar - like has been proven to solve the pod problem and we've been able to generalize it.
I'm -1 to a field on ObjectMeta until then.
type DeleteOptions struct { | ||
// Other fields omitted for brevity. | ||
|
||
// Reason indicates the reason for the Pod's termination. This field may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only valid for pods? (DeleteOptions is for everyone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would only ever use it for Pods, but it needs to be communicated to the API Server so that the reason can be set along with the DeletionTimestamp during graceful deletion, and it is an optional parameter to the delete operation. Is there is a more idiomatic, or otherwise superior, method of communicating the information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a pod-specific deletion options at the moment, so I can't tell you to put it somewhere else :)
However I can ask that it be clearly documented in the comment (and the field name, pending the answer to the other comment) that it is specifically for pods.
|
||
```golang | ||
// TerminationReasonDelivery is used to configure the delivery method for | ||
// termination reasons. It is a union type, and exactly one of the fields may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little weird to have parallel structs for users to fill out. Perhaps copy the Handler types and just add EnvName / HeaderName to the appropriate one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean by extending HttpAction and CommandAction to contain the appropriate fields? If so, I think we'd have to consider that these fields are only valid in the context of a pre-stop hook. We could modify validation to ensure that they are only used in this context and not in Probe's Handlers, but I'm not convinced that is cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant by copying HttpAction & CommandAction. (TerminationReasonHttpAction etc)
It probably feels a bit weird to copy code like that, but it allows you to write separate comments for each, which will generate better documentation for users.
The alternative is to somehow add a general parameter mechanism to the existing handlers and just leave it undefined for the other hooks at the moment. The generality is appealing to me as someone who likes abstraction, but I think it'll be better for our users if we split the types.
We can always make a backwards-compatible general mechanism in the future if another hook turns out to need a parameter.
valid environment variable. | ||
|
||
### Pod Deletion | ||
When the API Server performs its graceful delete processing, in addition to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a dup of the API server section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK formalizing "reason" for all deletions of all resources? That's what this does.
Why not formalize a reason for all HTTP and exec handlers?
string as the termination reason as shown below. | ||
|
||
```shell | ||
> kubectl delete po my-pod --reason="resolve issue 354961" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent a small set of enumerated reasons to be read programmatically and acted on by the container, or a human readable "log entry" type of message? It doesn't seem like the same field and mechanism should be used for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I delete a replicaset and give a reason, would you expect that to propagate to the deletion of the spawned pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...if that reason propagation behavior is desired, I think it should come in a phase two, because it's a lot of work.
|
||
1. If the `PreStop` handler indicates a command action, Kubelet will supply the | ||
termination reason to the container based on the following criteria. | ||
1. If the `ReasonDelivery` is nil Kubelet will set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to put the logic for handling default behaviors from Kubelet to the API (default.go) to keep the Kubelet a little bit simpler?
Something not mentioned in this is what happens if a user deletes the controller object (daemonset, deployment, RS, etc)--this will be way more common than users deleting individual pods. The intention clearly ought to end up as "NoRestart". However, the garbage collector is the component actually doing the deletions, and it doesn't (shouldn't) handle pods specially. So, uh, that's a puzzle for you to figure out. The easiest solution is probably to just make NoRestart the default setting unless something else is requested. |
@lavalamp So I made clear that the current implementation is applicable only to Pods, but the DeleteOptions.Reason and ObjectMeta.DeleteReason are left as general fields that can be extended to other use cases (e.g. Controller level reasons). I took your suggestion with respect to the object hierarchy and proposed DeleteHTTPGetAction and DeleteExecAction which are unioned by PreStopHandler. |
I wasn't really proposing termination reason propagation, I was more pointing it out as a pitfall (and as something that will confuse people no matter what you do with it... some people will absolutely expect it to propagate, some will expect it to not, and some will expect it to propagate to some extent, but not necessarily all the way to the leaves) |
// +optional | ||
HTTPHeaders []HTTPHeader | ||
// ReasonHeader is the header that will be set to the reason for a | ||
// deletion. This header defaults to "KUBE-DELETE-REASON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kube-Delete-Reason
to match standard header normalization
I still think it's confusing to mix human-specified and platform/controller-specified reasons in a single field, or not to define a small enumerated set of reasons that should be used (especially since deletion reasons need to be somewhat standardized if we expect a pod composed of several sidecar containers to be able to respond coherently to a single deletion reason) |
@Leggit My thought is that if we implement a general mechanism that does not enforce a specific enumeration for the first phase, and we find value, through use, in implementing an additional signal that is a typed enumeration, we can add it. However if we create the enum without first developing the general mechanism, and collecting data and feedback, from actual use, before implementing the enumeration, we have to worry about compatibility with what we released if user feedback compels us to modify it. |
Has this discussion moved along at all? Would really like to be able to clean up when someone decides to remove our product. |
|
||
1. Kubelet will always invoke the pre-stop handler prior to sending a `TERM` | ||
signal to the entry point of a container. | ||
1. pre-stop handlers will not contain complicated or long running business logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the case, even today. I've seen insanely complex / business logic pre-stop handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kow3ns why is that an assumption ? Would the termination reason not work if this assumption is not met ?
```golang | ||
// PreStopHandler invokes either a DeleteExecAction or a DeleteHTTPGetAction | ||
// prior to the graceful termination of a Pod. | ||
type PreStopHandler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why this isn't PreDeleteHandler, or StopExecAction. Shouldn't be both. Use the same verb consistently. Since we call it delete everywhere else, why change to stop for just this one field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left this as PreStopHandler and changed the variables internally yo DeleteExecAction, DeleteHTTPGetAction
// DeletionReason indicates the reason for the deletion of an API Object. | ||
// Its purpose is to provide an extra generic signal to watchers of API | ||
// Objects during the graceful termination process. | ||
DeletionReason string `json:"reason,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit this for the alpha version of this feature and use an annotation. ObjectMeta is forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's an extra question.
If I need to use this (abuse this) for additional metadata... why wouldn't I just put that in an annotation? If I'm going to need a channel to carry it, annotation is where it's at. We'd obviously need to gate the annotation name / data, but this looks like a loose coupling, not a strong coupling, field. If I'm going to have additional annotation data, then why not make this reason and the additional data things that can be carried through to graceful deletion annotations to start with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no guarantee watchers of the resource will observe its final state. I don't think this is currently a good delivery mechanism, even if it were an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only guarantee of deliver is to the stop action hook. I don’t think we need a new mechanism to observe deletion - we already have a five minute window before deleted are cleaned up. I don’t think we need a stronger constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d still prefer an annotation - we already have a mechanism for exposing annotations into containers, and I don’t see the additional value of a concrete field. We’d need to validate the deletion behavior is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think deletion reason is a good candidate for general object meta. The five minute window is an upper bound, no? and observers would need a lower bound. And the comment strongly implies this works for more than pods--most objects are probably deleted instantly.
i had voluntered to help on this earlier but didnt get time soon enough. I will send out an updated proposal by next weekend |
any DELETEd ObjectMeta. Proposes DeleteExecAction and DeleteHTTPGetAction as the extension point for user supplied configuration of reason delivery. Clarifies limitations
i added some minor edits, i am still thinking and mulling over the proposal, i will update this more, as i gather my thoughts |
// ReasonEnv is the environment variable that wil be populated with the | ||
// reason, if provided, for the Pod's termination. This variable defaults | ||
// to "KUBE_DELETE_REASON" | ||
ReasonEnv string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't seem to be any way to support this. We are not able to set any environment variables once the container has started, and invoking the pre-stop handler as env ReasonEnv=TerminationReason Command
will not work in containers that do not have env
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, a running application cannot read environment variables afresh after start. I've implemented something close to this proposal using a CRD. In my case I signal the reason for termination by writing it into a file. Maybe this is also applicable to this proposal...
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Co-authored-by: zc <ce.zheng@daocloud.io>
This proposal adds a signal to allow for applications to determine reason for a pod termination.