Terminate pod on container completion #3582

SergeyKanzhelev · 2022-10-05T07:27:10Z

Enhancement Description

One-line enhancement description (can be used as a release note):

Allow to configure container to terminate the pod on container completion when no restart should be performed as per restartPolicy.

Kubernetes Enhancement Proposal:
Discussion Link: https://docs.google.com/document/d/1gX_SOZXCNMcIe9d8CkekiT0GU9htH_my8OB7HSm3fM8/edit#
Primary contact (assignee): @SergeyKanzhelev
Responsible SIGs:

/sig node

Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.26
- Beta release target (x.y): 1.27
- Stable release target (x.y): 1.29
Alpha
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- Docs (k/website) update PR(s):

The text was updated successfully, but these errors were encountered:

sftim · 2022-12-22T20:33:13Z

The discussion uses the title Per Container Restart Policy override, which I prefer because you might want to terminate an entire Pod when a particular container fails.

thockin · 2023-01-28T01:06:48Z

Since we have converged on #3759 I will close this

SergeyKanzhelev · 2023-02-16T16:41:10Z

From @davidhadas in https://kubernetes.slack.com/archives/C0BP8PW9G/p1676531990732709

There are security related use cases in which a sidecar that is used to protect a pod, detects that the pod is compromised. In such cases it is necessary for the sidecar to initiate pod deletion.

Providing the sidecar with permission to delete pods is not in line with the principle of least privilege (as the sidecar should only have permission to delete its own pod and not all pods in the namespace).

Ideally, it would be helpful if the container will simply exit and kubelet will decide to delete the pod instead of restarting the sidecar. This may require that the container is marked in the podSpec as one whose untimely demise should result in the Pod untimely demise. (Naturally we rely here on existing mechanisms to bring a new pod instead of our compromised pod once it gets deleted).

We discussed this in sig-security (during our latest sig-security-tooling meeting) and wanted to bring this up with sig-node (edited)

Re-opening to collect more use cases to understand if this feature worth pursuing. Originally the thought was that the scenarios beyond sidecars are limited.

SergeyKanzhelev · 2023-02-16T16:44:27Z

Scenario clarification from @davidhadas: the Pod that needs to be terminated is marked as restartPolicy=Always. And it's one of the deployment's controlled many Pods

SergeyKanzhelev · 2023-02-16T16:52:42Z

Clarifications - copied from slack for history here:

I don't see the security implication regarding where it get restarted. It is fine if the decision will be to start it exactly where it was running (or somewhere else).

All that is needed is for the container to signal somehow to kubelet to delete the pod. Apparently the simplest way is to exit from the container (maybe with specific exit code, or with error, or maybe any exit will do). Now when we exited, kubelet need to take a decision if to restart the container (as done today) or delete the pod (required feature). to decide between the two we probably need to have some indication in the podSpec indicating that this container is marked to never be restarted and instead the pod need to be deleted when it exists.

I can repeat this in GitHub, but there are so many github issues and PRs on this and related issues, that I am not really sure where it is most relevant.

davidhadas · 2023-02-16T17:02:32Z

To be precise the need identified (as was quoted above) will be met if we support deletion of the pod (which may be followed by later RelicaSet, for example, starting a new pod wherever) if a specific container exists - i.e. it calls for some marking on the container to indicate that if this container exists, it should result in the deletion of the pod (as an example by supporting restartPolict=Never, but other markings can also be used).

It is best that if we implement this feature, the decision of whether to restart a pod on container exit will be per container such that users will be able to decide which container should result in pod termination and which should result in container restart.

thockin · 2023-02-16T18:10:04Z

Is this the same as #3676 ? aka "keystone" containers

SergeyKanzhelev · 2023-02-17T08:46:35Z

Is this the same as #3676 ? aka "keystone" containers

I think the semantic of re-running Init containers and killing the entire Pod is different. Both has scenarios associated with them. But Terminate Pod is more "destructive" "keystone" than what asked in #3676. #3676 is more about dependencies between Containers the way I read it. It can be implemented via Terminate Pod though.

thockin · 2023-02-17T16:18:17Z

Ok. Fair. The "depth" of restart is different. Is there any reason to have both? Why not just say this keystone feature goes all the way down? Destroys volumes, kills containers, restarts everything from the ground up.

…

On Fri, Feb 17, 2023, 12:46 AM Sergey Kanzhelev ***@***.***> wrote: Is this the same as #3676 <#3676> ? aka "keystone" containers I think the semantic of re-running Init containers and killing the entire Pod is different. Both has scenarios associated with them. But Terminate Pod is more "destructive" "keystone" than what asked in #3676 <#3676>. #3676 <#3676> is more about dependencies between Containers the way I read it. It can be implemented via Terminate Pod though. — Reply to this email directly, view it on GitHub <#3582 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVCIGAYMJSHWXULXR4TWX43HNANCNFSM6AAAAAAQ5I47T4> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

ceastman-r7 · 2023-03-20T13:26:20Z

Use case: kubernetes cronjobs/jobs where a service mesh is involved. The service mesh sidecar container is designed to run forever. But in the case of a kubernetes cronjob/job the service mesh sidecar should terminate when the main pod completes its' job. istio/istio#11659

thockin · 2023-03-20T18:32:50Z

@ceastman-r7 this is somewhat different - sidecars (which is alpha in 1.27) covers what you need I think.

ceastman-r7 · 2023-03-20T21:55:55Z

@thockin Do you have a link to the sidecars documentation?

SergeyKanzhelev · 2023-03-20T22:11:20Z

@thockin Do you have a link to the sidecars documentation?

#753

smarterclayton · 2023-04-07T04:42:44Z

Now that graceful shutdown guarantees pods report their terminal status (via changes to how we report pod phase), I think we’ve established more context around what the node is allowed to do to a pod (indicate via shutdown that the pod is terminal regardless of restart policy). While I would be hesitant to allow pods to indicate they can be deleted, a workload already has the power to “give up, take its toys, and go home” by crash looping, which is not always the most effective mechanism.

For restart never containers (init or otherwise) giving up makes a lot of sense if the inputs are fixed and there is an expectation of a time window for retry before giving up. For restart always containers, I would suggest exploring questions like:

What kind of container configuration states can be identified as “unsolvable” only by the workload? The format of an env var argument is wrong? The command line argument is impossible? The volume is missing data? A necessary config map directory was not mounted?
Is there any way that workloads failing could interact more effectively with rollout? Ie, allowing workloads to signal to their controllers in a way that improves overall workload health?
Are there any use cases where the workload should be able to decide the current node it is running on is deficient and refuse to run on it? Such as absence of implicit node apis (the ones we don’t namespace, like the type of processor, the shape of memory)? Is there value in workloads taking that into their own hands (perhaps more privileged tools), like eviction?

No matter what we’d need to improve controller back off when pods fail - I just reopened and froze two issues that such a feature would make significantly more dangerous (which is why this one caught my eye).

SergeyKanzhelev · 2023-04-21T21:54:45Z

All scenarios I saw thus far were around Jobs (restart policy Never or OnFailure). For terminate policy Always some of scenarios you listed with infinite restart backoff can be implemented via Init Containers.

How do you see risk vs. value for the TerminatPod behavior for finite Pods?

SergeyKanzhelev · 2023-05-05T21:47:58Z

This KEP has valid scenarios, but nobody to work on it. Likely not for 1.28 release

k8s-triage-robot · 2024-01-19T18:02:07Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-18T19:01:13Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

thockin · 2024-02-18T21:13:08Z

@tallclair

k8s-triage-robot · 2024-05-18T21:55:20Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-18T16:09:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 5, 2022

SergeyKanzhelev mentioned this issue Oct 5, 2022

Terminate pod on container completion #3580

Closed

SergeyKanzhelev mentioned this issue Jan 25, 2023

feature: Stop pod when one of its containers dies containers/podman#16832

Open

thockin closed this as completed Jan 28, 2023

thockin closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2023

fpoirotte mentioned this issue Feb 1, 2023

Sidecar containers #3759

Closed

4 tasks

SergeyKanzhelev reopened this Feb 16, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024

thockin removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 18, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2024

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 18, 2024

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminate pod on container completion #3582

Terminate pod on container completion #3582

SergeyKanzhelev commented Oct 5, 2022

sftim commented Dec 22, 2022

thockin commented Jan 28, 2023

SergeyKanzhelev commented Feb 16, 2023

SergeyKanzhelev commented Feb 16, 2023

SergeyKanzhelev commented Feb 16, 2023

davidhadas commented Feb 16, 2023 •

edited

Loading

thockin commented Feb 16, 2023 •

edited

Loading

SergeyKanzhelev commented Feb 17, 2023

thockin commented Feb 17, 2023 via email

ceastman-r7 commented Mar 20, 2023

thockin commented Mar 20, 2023

ceastman-r7 commented Mar 20, 2023

SergeyKanzhelev commented Mar 20, 2023

smarterclayton commented Apr 7, 2023

SergeyKanzhelev commented Apr 21, 2023

SergeyKanzhelev commented May 5, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

thockin commented Feb 18, 2024

k8s-triage-robot commented May 18, 2024

k8s-triage-robot commented Aug 18, 2024

Terminate pod on container completion #3582

Terminate pod on container completion #3582

Comments

SergeyKanzhelev commented Oct 5, 2022

Enhancement Description

sftim commented Dec 22, 2022

thockin commented Jan 28, 2023

SergeyKanzhelev commented Feb 16, 2023

SergeyKanzhelev commented Feb 16, 2023

SergeyKanzhelev commented Feb 16, 2023

davidhadas commented Feb 16, 2023 • edited Loading

thockin commented Feb 16, 2023 • edited Loading

SergeyKanzhelev commented Feb 17, 2023

thockin commented Feb 17, 2023 via email

ceastman-r7 commented Mar 20, 2023

thockin commented Mar 20, 2023

ceastman-r7 commented Mar 20, 2023

SergeyKanzhelev commented Mar 20, 2023

smarterclayton commented Apr 7, 2023

SergeyKanzhelev commented Apr 21, 2023

SergeyKanzhelev commented May 5, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

thockin commented Feb 18, 2024

k8s-triage-robot commented May 18, 2024

k8s-triage-robot commented Aug 18, 2024

davidhadas commented Feb 16, 2023 •

edited

Loading

thockin commented Feb 16, 2023 •

edited

Loading