Initial commit of termination-reason proposal #541

kow3ns · 2017-04-18T00:20:26Z

This proposal adds a signal to allow for applications to determine reason for a pod termination.

kow3ns · 2017-04-18T00:22:41Z

@kubernetes/sig-apps-feature-requests
@kubernetes/sig-api-machinery-api-reviews
@kubernetes/sig-node-api-reviews
@bgrant0607
@thockin
@erictune
@smarterclayton
@dchen1107
@yguo0905
@Kargakis
@janetkuo
@foxish
@enisoc

kow3ns · 2017-04-18T00:28:18Z

ref kubernetes/kubernetes#1462

lavalamp · 2017-04-18T00:38:52Z

contributors/design-proposals/termination-reason.md

+	ReasonEviction TerminationReason = "Eviction"
+	// ReasonIntolerableTaint is the default reason used to communicate that a Pod 
+	// has been terminated due to a taint for which it has no declared toleration.
+	ReasonIntolerableTaint = "IntolerableTaint"


I think this list is mixing two subtly different pieces of information, like I think this is basically the same as an eviction? Also, it will be difficult to extend this without breaking clients.

What about switching to two (or more) lists.

TerminationIntention: {RestartInPlace, RestartElsewhere, NoRestart} TerminationReason: {TaintViolation, BinPacking, Update, ...}

The intention list should basically never need to change. I am not sure it is a good idea to add the reason list at all. Could also consider making it a user-selectable string (i.e., not an enum).

TerminationIntention enum seems to be promising since it eliminates the needs for applications to check various reasons and then decide whether/where the pods would be restarted; otherwise, applications will probably need to be updated to handle each newly added reasons in the future.

This value will not be in DeleteOptions since it's only specific to pod deletion and kubectl could take this value as an enum, instead of a string.

TerminationIntention seems to be all we need at this point. +1

A few things, perhaps the terminology is a bit overloaded.

By eviction, I mean eviction as it is performed in the context of a drain. As in, we have created an eviction resource and are attempting to delete a Pod with respect to any application declared disruption budgets.

Intolerable taints and pressure threshold violations (afiak) do not even attempt to respect eviction resources, but this is the mechanism that will be used to remove a StatefulSet's Pods from a node when local storage is implemented and the storage media fails.

The reason that I prefer one field, that is generic, and passes an arbitrary string value is that we will not foresee or enumerate all possible termination signals that a user might want to send to deal with different categories of distributed systems using (eventually) both local and remote storage. A generic string, imo, gives users the functionality they need without being over specific or attempting to constrain the signals to the ones that we can foresee right now.

One advantage of TerminationIntention is that users do not need to understand the consequence of each termination reason -- it clearly indicates what the pod's behavior would be, which is what the applications really need, I think.

And I don't disagree with an additional general reason string field, which could carry arbitrary human-readable text, at least for logging purpose.

lavalamp · 2017-04-18T00:40:00Z

contributors/design-proposals/termination-reason.md

+}
+```
+
+The ObjectMeta struct is modified to carry the termination reason via 


Does this belong in metadata? Why not status?

It's in ObjectMeta because its only used in conjunction with DeletionTimestamp during graceful deletion. However, there is no reason it could not be part of PodStatus if this makes the API cleaner or if this placement is more appropriate. If there is a strong feeling that PodStatus is a better location than we should move it.

In the current design, the reason is passed via DeleteOptions, which is not specific to pod, so it seems reasonable not to put it in PodStatus.

In the current design, the reason is passed via DeleteOptions, which is not specific to pod, so it seems reasonable not to put it in PodStatus.

But the TerminationReason is specific to Pods, right?

deletion timestamp and grace period seconds are in metadata because they're logically applicable to all (or at least multiple) objects, even if only implemented for pods. I am claiming that either it needs to be clear that TerminationReason is likewise logically applicable to more than just pods, or it does not belong in metadata. Metadata is shared by all objects, it should not contain any fields that are only meaningful for a single object type.

So as @liggitt suggests below we could image a world were a signal is sent to controllers and then propagated to Pods' handlers when the controller performs a deletion. If we use PodStatus, we are always constrained to Podsl. If we put it in ObjectMeta we have flexibility going forward. I'm more in favor of taking @thockin's suggestion and generalizing.

ObjectMeta is forever. I'd prefer not to rush into a pod specific field on ObjectMeta without a lot of certainty. I am extremely hesitant to say this should be anything but an annotation until it clears a pretty high bar - like has been proven to solve the pod problem and we've been able to generalize it.

I'm -1 to a field on ObjectMeta until then.

lavalamp · 2017-04-18T00:40:25Z

contributors/design-proposals/termination-reason.md

+type DeleteOptions struct {
+	// Other fields omitted for brevity.
+
+	// Reason indicates the reason for the Pod's termination. This field may 


Is this only valid for pods? (DeleteOptions is for everyone)

You would only ever use it for Pods, but it needs to be communicated to the API Server so that the reason can be set along with the DeletionTimestamp during graceful deletion, and it is an optional parameter to the delete operation. Is there is a more idiomatic, or otherwise superior, method of communicating the information?

We don't have a pod-specific deletion options at the moment, so I can't tell you to put it somewhere else :)

However I can ask that it be clearly documented in the comment (and the field name, pending the answer to the other comment) that it is specifically for pods.

lavalamp · 2017-04-18T00:43:11Z

contributors/design-proposals/termination-reason.md

+
+```golang
+// TerminationReasonDelivery is used to configure the delivery method for
+// termination reasons. It is a union type, and exactly one of the fields may be


It is a little weird to have parallel structs for users to fill out. Perhaps copy the Handler types and just add EnvName / HeaderName to the appropriate one?

Do you mean by extending HttpAction and CommandAction to contain the appropriate fields? If so, I think we'd have to consider that these fields are only valid in the context of a pre-stop hook. We could modify validation to ensure that they are only used in this context and not in Probe's Handlers, but I'm not convinced that is cleaner.

I meant by copying HttpAction & CommandAction. (TerminationReasonHttpAction etc)

It probably feels a bit weird to copy code like that, but it allows you to write separate comments for each, which will generate better documentation for users.

The alternative is to somehow add a general parameter mechanism to the existing handlers and just leave it undefined for the other hooks at the moment. The generality is appealing to me as someone who likes abstraction, but I think it'll be better for our users if we split the types.

We can always make a backwards-compatible general mechanism in the future if another hook turns out to need a parameter.

lavalamp · 2017-04-18T00:45:06Z

contributors/design-proposals/termination-reason.md

+valid environment variable.
+
+### Pod Deletion
+When the API Server performs its graceful delete processing, in addition to 


this is a dup of the API server section.

thockin

Are we OK formalizing "reason" for all deletions of all resources? That's what this does.

Why not formalize a reason for all HTTP and exec handlers?

liggitt · 2017-04-18T04:26:32Z

contributors/design-proposals/termination-reason.md

+string as the termination reason as shown below.
+
+```shell
+ > kubectl delete po my-pod --reason="resolve issue 354961"


Is the intent a small set of enumerated reasons to be read programmatically and acted on by the container, or a human readable "log entry" type of message? It doesn't seem like the same field and mechanism should be used for both

If I delete a replicaset and give a reason, would you expect that to propagate to the deletion of the spawned pods?

...if that reason propagation behavior is desired, I think it should come in a phase two, because it's a lot of work.

yguo0905 · 2017-04-18T07:18:41Z

contributors/design-proposals/termination-reason.md

+
+1. If the `PreStop` handler indicates a command action, Kubelet will supply the 
+termination reason to the container based on the following criteria.
+   1. If the `ReasonDelivery` is nil Kubelet will set the 


Does it make sense to put the logic for handling default behaviors from Kubelet to the API (default.go) to keep the Kubelet a little bit simpler?

lavalamp · 2017-04-18T17:18:02Z

Something not mentioned in this is what happens if a user deletes the controller object (daemonset, deployment, RS, etc)--this will be way more common than users deleting individual pods.

The intention clearly ought to end up as "NoRestart". However, the garbage collector is the component actually doing the deletions, and it doesn't (shouldn't) handle pods specially. So, uh, that's a puzzle for you to figure out.

The easiest solution is probably to just make NoRestart the default setting unless something else is requested.

kow3ns · 2017-04-20T22:22:55Z

@lavalamp So I made clear that the current implementation is applicable only to Pods, but the DeleteOptions.Reason and ObjectMeta.DeleteReason are left as general fields that can be extended to other use cases (e.g. Controller level reasons). I took your suggestion with respect to the object hierarchy and proposed DeleteHTTPGetAction and DeleteExecAction which are unioned by PreStopHandler.
@Leggit I think the intention should be more clear and in specific the use case you mentioned is called out as future work.
@thockin The proposal is now more general from the API side with specifics wrt how to handle containers in Pod's. It should be general enough for extensions in the future. I don't think that need to specialize the Handlers in the general case because Probes don't have a reason and PostStart's are called out of sync with the container entry point. What we can achieve with a PreStop we need to do with an init container during initial launch.

liggitt · 2017-04-20T22:26:38Z

@liggitt I think the intention should be more clear and in specific the use case you mentioned is called out as future work.

I wasn't really proposing termination reason propagation, I was more pointing it out as a pitfall (and as something that will confuse people no matter what you do with it... some people will absolutely expect it to propagate, some will expect it to not, and some will expect it to propagate to some extent, but not necessarily all the way to the leaves)

liggitt · 2017-04-20T22:33:57Z

contributors/design-proposals/termination-reason.md

+	// +optional
+	HTTPHeaders []HTTPHeader
+	// ReasonHeader is the header that will be set to the reason for a 
+	// deletion. This header defaults to "KUBE-DELETE-REASON"


Kube-Delete-Reason to match standard header normalization

liggitt · 2017-04-20T22:38:23Z

I still think it's confusing to mix human-specified and platform/controller-specified reasons in a single field, or not to define a small enumerated set of reasons that should be used (especially since deletion reasons need to be somewhat standardized if we expect a pod composed of several sidecar containers to be able to respond coherently to a single deletion reason)

kow3ns · 2017-04-20T23:48:09Z

@Leggit My thought is that if we implement a general mechanism that does not enforce a specific enumeration for the first phase, and we find value, through use, in implementing an additional signal that is a typed enumeration, we can add it. However if we create the enum without first developing the general mechanism, and collecting data and feedback, from actual use, before implementing the enumeration, we have to worry about compatibility with what we released if user feedback compels us to modify it.

bboreham · 2017-07-20T23:29:29Z

Has this discussion moved along at all? Would really like to be able to clean up when someone decides to remove our product.

smarterclayton · 2017-08-01T03:12:08Z

contributors/design-proposals/termination-reason.md

+
+1. Kubelet will always invoke the pre-stop handler prior to sending a `TERM` 
+signal to the entry point of a container.
+1. pre-stop handlers will not contain complicated or long running business logic. 


I don't think this is the case, even today. I've seen insanely complex / business logic pre-stop handlers.

@kow3ns why is that an assumption ? Would the termination reason not work if this assumption is not met ?

smarterclayton · 2017-08-01T03:15:21Z

contributors/design-proposals/termination-reason.md

+```golang
+// PreStopHandler invokes either a DeleteExecAction or a DeleteHTTPGetAction 
+// prior to the graceful termination of a Pod.
+type PreStopHandler struct {


I'm confused why this isn't PreDeleteHandler, or StopExecAction. Shouldn't be both. Use the same verb consistently. Since we call it delete everywhere else, why change to stop for just this one field?

left this as PreStopHandler and changed the variables internally yo DeleteExecAction, DeleteHTTPGetAction

smarterclayton · 2017-08-01T03:17:19Z

contributors/design-proposals/termination-reason.md

+    // DeletionReason indicates the reason for the deletion of an API Object. 
+    // Its purpose is to provide an extra generic signal to watchers of API 
+    // Objects during the graceful termination process. 
+	DeletionReason string `json:"reason,omitempty"`


I'd omit this for the alpha version of this feature and use an annotation. ObjectMeta is forever.

So here's an extra question.

If I need to use this (abuse this) for additional metadata... why wouldn't I just put that in an annotation? If I'm going to need a channel to carry it, annotation is where it's at. We'd obviously need to gate the annotation name / data, but this looks like a loose coupling, not a strong coupling, field. If I'm going to have additional annotation data, then why not make this reason and the additional data things that can be carried through to graceful deletion annotations to start with?

There is currently no guarantee watchers of the resource will observe its final state. I don't think this is currently a good delivery mechanism, even if it were an annotation.

The only guarantee of deliver is to the stop action hook. I don’t think we need a new mechanism to observe deletion - we already have a five minute window before deleted are cleaned up. I don’t think we need a stronger constraint.

I’d still prefer an annotation - we already have a mechanism for exposing annotations into containers, and I don’t see the additional value of a concrete field. We’d need to validate the deletion behavior is correct.

I still don't think deletion reason is a good candidate for general object meta. The five minute window is an upper bound, no? and observers would need a lower bound. And the comment strongly implies this works for more than pods--most objects are probably deleted instantly.

krmayankk · 2017-10-29T06:35:49Z

i had voluntered to help on this earlier but didnt get time soon enough. I will send out an updated proposal by next weekend

any DELETEd ObjectMeta. Proposes DeleteExecAction and DeleteHTTPGetAction as the extension point for user supplied configuration of reason delivery. Clarifies limitations

krmayankk · 2017-11-08T07:17:06Z

i added some minor edits, i am still thinking and mulling over the proposal, i will update this more, as i gather my thoughts

yguo0905 · 2017-12-09T04:33:56Z

contributors/design-proposals/termination-reason.md

+	// ReasonEnv is the environment variable that wil be populated with the 
+	// reason, if provided, for the Pod's termination. This variable defaults
+	// to "KUBE_DELETE_REASON"
+	ReasonEnv string


There doesn't seem to be any way to support this. We are not able to set any environment variables once the container has started, and invoking the pre-stop handler as env ReasonEnv=TerminationReason Command will not work in containers that do not have env.

Right, a running application cannot read environment variables afresh after start. I've implemented something close to this proposal using a CRD. In my case I signal the reason for termination by writing it into a file. Maybe this is also applicable to this proposal...

fejta-bot · 2018-05-14T13:38:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-06-13T14:25:25Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-07-13T15:11:10Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Co-authored-by: zc <ce.zheng@daocloud.io>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 18, 2017

lavalamp reviewed Apr 18, 2017

View reviewed changes

thockin reviewed Apr 18, 2017

View reviewed changes

liggitt reviewed Apr 18, 2017

View reviewed changes

yguo0905 reviewed Apr 18, 2017

View reviewed changes

liggitt reviewed Apr 20, 2017

View reviewed changes

dhilipkumars mentioned this pull request May 1, 2017

Pod or Job lifecycle lacks of a mechanism to define cleanup actions once you delete a pod kubernetes/kubernetes#35183

Closed

yguo0905 mentioned this pull request May 1, 2017

Client Initiated Pod Termination Reason kubernetes/enhancements#289

Closed

yguo0905 mentioned this pull request May 8, 2017

API changes for client-initiated termination reason kubernetes/kubernetes#45504

Closed

This was referenced May 25, 2017

(Weave) Kubeadm reset on node not restoring initial state. kubernetes/kubeadm#255

Closed

Weave network not deleted by 'kubeadm reset' weaveworks/weave#2911

Open

smarterclayton reviewed Aug 1, 2017

View reviewed changes

k8s-github-robot assigned idvoretskyi and grodrigues3 Aug 15, 2017

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 15, 2017

grodrigues3 unassigned idvoretskyi Aug 16, 2017

grodrigues3 removed their assignment Aug 16, 2017

k8s-github-robot assigned grodrigues3 and sarahnovotny Aug 16, 2017

grodrigues3 assigned lavalamp and unassigned sarahnovotny and grodrigues3 Aug 17, 2017

erictune mentioned this pull request Sep 1, 2017

[WIP] Implement deferContainers - Pod Termination Semantics kubernetes/kubernetes#47422

Closed

Kenneth Owens and others added 3 commits November 7, 2017 23:14

Initial commit of termination-reason proposal

46d68ae

Implements Reason as a general string applicable to DeleteOptions and

3ba0bff

any DELETEd ObjectMeta. Proposes DeleteExecAction and DeleteHTTPGetAction as the extension point for user supplied configuration of reason delivery. Clarifies limitations

minor review comments

885bb7d

yguo0905 reviewed Dec 9, 2017

View reviewed changes

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2018

k8s-ci-robot closed this Jul 13, 2018

danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023

Add myself to members list (kubernetes#541)

7794bc1

Co-authored-by: zc <ce.zheng@daocloud.io>

Initial commit of termination-reason proposal #541

Initial commit of termination-reason proposal #541

Uh oh!

Conversation

kow3ns commented Apr 18, 2017

Uh oh!

kow3ns commented Apr 18, 2017

Uh oh!

kow3ns commented Apr 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yguo0905 Apr 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thockin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lavalamp commented Apr 18, 2017

Uh oh!

kow3ns commented Apr 20, 2017

Uh oh!

liggitt commented Apr 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liggitt commented Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kow3ns commented Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboreham commented Jul 20, 2017

Uh oh!

Choose a reason for hiding this comment

yguo0905 Apr 18, 2017 •

edited

Loading

liggitt commented Apr 20, 2017 •

edited

Loading

kow3ns commented Apr 20, 2017 •

edited

Loading

smarterclayton Aug 1, 2017 •

edited

Loading

krmayankk Nov 7, 2017 •

edited

Loading

smarterclayton Aug 1, 2017 •

edited

Loading