From 19fd69a82e76f062514952cbe5029ad77f82b27d Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Wed, 21 Dec 2016 01:44:59 +0000 Subject: [PATCH 1/4] Propose rollout for Docker shared PID namespace --- .../design-proposals/pod-pid-namespace.md | 62 +++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 contributors/design-proposals/pod-pid-namespace.md diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace.md new file mode 100644 index 00000000000..4c508bdefeb --- /dev/null +++ b/contributors/design-proposals/pod-pid-namespace.md @@ -0,0 +1,62 @@ +# Shared PID Namespace for the Docker Runtime + +Pods share many namespaces, but the ability to share a PID namespace was not +supported by Docker until version 1.12. This document proposes how to roll out +support for sharing the PID namespace in the docker runtime. + +## Motivation + +Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), +and enables: + + 1. signaling between containers, which is useful for side cars (e.g. for + signaling a daemon process after rotating logs). + 2. easier troubleshooting of pods. + 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the + infra container. + +## Goals and Non-Goals + +Goals include: + - Change default behavior in the Kubernetes Docker runtime + +Non-goals include: + - Creating an init solution that works for all runtimes + - Supporting isolated PID namespace indefinitely + +## Rollout Plan + +Sharing the PID namespace changes an implicit behavior of the Docker runtime +whereby the command run by the container image is always PID 1. This is a side +effect of isolated namespaces rather than intentional behavior, but users may +have built upon this assumption so we should change the default behavior over +the course of multiple releases. + + 1. Release 1.6: Enable the shared PID namespace for pods annotated with + `docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with + Docker >= 1.12. Pods with this annotation will fail to start with older + Docker versions rather than failing to meet a user's expectation. + 2. Release 1.7: Enable the shared PID namespace for pods unless annotated + with `docker.kubernetes.io/shared-pid: false` (i.e. opt-out) when running + with Docker >= 1.12. + 3. Release 1.8: Remove the annotation. All pods receive a shared PID + namespace when running with Docker >= 1.12. + +With each step we will add a release note that clearly describes the change. +After each release we will poll kubernetes-users to determine what, if any, +applications were impacted by this change. If we discover a use case which +cannot be accommodated by a shared PID namespace, we will abort step 3 and +instead formalize a shared-pid field into the pod spec. + +## Alternatives Considered + +Changing this behavior over the course of 6 months is a bit conservative. We +could instead change the behavior in 2 releases by omitting the first step, but +the opt-in phase allows users to test the change with fewer surprises. + +[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]() + From 59a373ca1ab0b4a02a136ea23089156dd582d020 Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Wed, 4 Jan 2017 14:49:48 -0800 Subject: [PATCH 2/4] Constrain docker shared pid proposal to rollout Also rename file to be docker specific. --- ...-namespace.md => pod-pid-namespace-docker.md} | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) rename contributors/design-proposals/{pod-pid-namespace.md => pod-pid-namespace-docker.md} (77%) diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace-docker.md similarity index 77% rename from contributors/design-proposals/pod-pid-namespace.md rename to contributors/design-proposals/pod-pid-namespace-docker.md index 4c508bdefeb..924b626d607 100644 --- a/contributors/design-proposals/pod-pid-namespace.md +++ b/contributors/design-proposals/pod-pid-namespace-docker.md @@ -1,8 +1,9 @@ # Shared PID Namespace for the Docker Runtime Pods share many namespaces, but the ability to share a PID namespace was not -supported by Docker until version 1.12. This document proposes how to roll out -support for sharing the PID namespace in the docker runtime. +supported by Docker until version 1.12. SIG Node approved a change to the +default behavior contingent on a brief rollout plan, which is this document. +Please refer to [#1615](https://issues.k8s.io/1615) for full technical details. ## Motivation @@ -18,11 +19,16 @@ and enables: ## Goals and Non-Goals Goals include: - - Change default behavior in the Kubernetes Docker runtime + - Changing default behavior in the Kubernetes Docker runtime Non-goals include: - Creating an init solution that works for all runtimes - Supporting isolated PID namespace indefinitely + - Addressing the larger issue of requiring shared namespaces in all runtimes + +Kubernetes does not currently specify how runtimes must support a PID namespace, +but many runtimes (e.g. cri-o & rkt) already support a shared namespace. This +rolls out support for Docker. ## Rollout Plan @@ -30,7 +36,9 @@ Sharing the PID namespace changes an implicit behavior of the Docker runtime whereby the command run by the container image is always PID 1. This is a side effect of isolated namespaces rather than intentional behavior, but users may have built upon this assumption so we should change the default behavior over -the course of multiple releases. +the course of multiple releases. (The following release numbers are earliest +possible releases and may change based on implementation and community +feedback.) 1. Release 1.6: Enable the shared PID namespace for pods annotated with `docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with From 1a7c723a94366f43b5a94d0a4b8e093317ee4ebc Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Wed, 18 Jan 2017 17:27:53 -0800 Subject: [PATCH 3/4] Require shared PID namespace in CRI & plan rollout --- .../container-runtime-interface-v1.md | 2 +- .../pod-pid-namespace-docker.md | 70 ----------------- .../design-proposals/pod-pid-namespace.md | 78 +++++++++++++++++++ 3 files changed, 79 insertions(+), 71 deletions(-) delete mode 100644 contributors/design-proposals/pod-pid-namespace-docker.md create mode 100644 contributors/design-proposals/pod-pid-namespace.md diff --git a/contributors/design-proposals/container-runtime-interface-v1.md b/contributors/design-proposals/container-runtime-interface-v1.md index 024b1e101d0..d305aaaa200 100644 --- a/contributors/design-proposals/container-runtime-interface-v1.md +++ b/contributors/design-proposals/container-runtime-interface-v1.md @@ -86,7 +86,7 @@ container setup that are not currently trackable as Pod constraints, e.g., filesystem setup, container image pulling, etc.* A container in a PodSandbox maps to an application in the Pod Spec. For Linux -containers, they are expected to share at least network and IPC namespaces, +containers, they are expected to share at least network, PID and IPC namespaces, with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615). diff --git a/contributors/design-proposals/pod-pid-namespace-docker.md b/contributors/design-proposals/pod-pid-namespace-docker.md deleted file mode 100644 index 924b626d607..00000000000 --- a/contributors/design-proposals/pod-pid-namespace-docker.md +++ /dev/null @@ -1,70 +0,0 @@ -# Shared PID Namespace for the Docker Runtime - -Pods share many namespaces, but the ability to share a PID namespace was not -supported by Docker until version 1.12. SIG Node approved a change to the -default behavior contingent on a brief rollout plan, which is this document. -Please refer to [#1615](https://issues.k8s.io/1615) for full technical details. - -## Motivation - -Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), -and enables: - - 1. signaling between containers, which is useful for side cars (e.g. for - signaling a daemon process after rotating logs). - 2. easier troubleshooting of pods. - 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the - infra container. - -## Goals and Non-Goals - -Goals include: - - Changing default behavior in the Kubernetes Docker runtime - -Non-goals include: - - Creating an init solution that works for all runtimes - - Supporting isolated PID namespace indefinitely - - Addressing the larger issue of requiring shared namespaces in all runtimes - -Kubernetes does not currently specify how runtimes must support a PID namespace, -but many runtimes (e.g. cri-o & rkt) already support a shared namespace. This -rolls out support for Docker. - -## Rollout Plan - -Sharing the PID namespace changes an implicit behavior of the Docker runtime -whereby the command run by the container image is always PID 1. This is a side -effect of isolated namespaces rather than intentional behavior, but users may -have built upon this assumption so we should change the default behavior over -the course of multiple releases. (The following release numbers are earliest -possible releases and may change based on implementation and community -feedback.) - - 1. Release 1.6: Enable the shared PID namespace for pods annotated with - `docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with - Docker >= 1.12. Pods with this annotation will fail to start with older - Docker versions rather than failing to meet a user's expectation. - 2. Release 1.7: Enable the shared PID namespace for pods unless annotated - with `docker.kubernetes.io/shared-pid: false` (i.e. opt-out) when running - with Docker >= 1.12. - 3. Release 1.8: Remove the annotation. All pods receive a shared PID - namespace when running with Docker >= 1.12. - -With each step we will add a release note that clearly describes the change. -After each release we will poll kubernetes-users to determine what, if any, -applications were impacted by this change. If we discover a use case which -cannot be accommodated by a shared PID namespace, we will abort step 3 and -instead formalize a shared-pid field into the pod spec. - -## Alternatives Considered - -Changing this behavior over the course of 6 months is a bit conservative. We -could instead change the behavior in 2 releases by omitting the first step, but -the opt-in phase allows users to test the change with fewer surprises. - -[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ - - - -[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]() - diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace.md new file mode 100644 index 00000000000..f5c48e3f20b --- /dev/null +++ b/contributors/design-proposals/pod-pid-namespace.md @@ -0,0 +1,78 @@ +# Shared PID Namespace + +Pods share namespaces where possible, but a requirement for sharing the PID +namespace has not been defined due to lack of support in Docker. Docker began +supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt, +cri-o, hyper) have already implemented a shared PID namespace. + +This proposal defines a shared PID namespace as a requirement of the Container +Runtime Interface and links its rollout in Docker to that of the CRI. + +## Motivation + +Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), +and enables: + + 1. signaling between containers, which is useful for side cars (e.g. for + signaling a daemon process after rotating logs). + 2. easier troubleshooting of pods. + 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the + infra container. + +## Goals and Non-Goals + +Goals include: + - Changing default behavior in the Docker runtime as implemented by the CRI + - Making Docker behavior compatible with the other Kubernetes runtimes + +Non-goals include: + - Creating an init solution that works for all runtimes + - Supporting isolated PID namespace indefinitely + +## Modification to the Docker Runtime + +We will modify the Docker implementation of the CRI to use a shared PID +namespace when running with a version of Docker >= 1.12. The legacy +`dockertools` implementation will not be changed. + +Linking this change to the CRI means that Kubernetes users who care to test such +changes can test the combined changes at once. Users who do not care to test +such changes will be insulated by Kubernetes not recommending Docker >= 1.12 +until after switching to the CRI. + +Other changes that must be made to support this change: + +1. Ensure all containers restart if the infra container responsible for the + PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace + dies all containers sharing that namespace are killed as well.) +2. Modify the Infra container used by the Docker runtime to reap orphaned + zombies ([#36853](https://pr.k8s.io/36853)). + +## Rollout Plan + +SIG Node is planning to switch to the CRI as a default in 1.6, at which point +users with Docker >= 1.12 will be able to test Shared namespaces. Switching +back to isolated PID namespaces will require disabling the CRI. + +At some point, say 1.7, SIG Node will remove support for disabling the CRI. +After this point users must roll back to a previous version of Kubernetes or +Docker to achieve PID namespace isolation. This is acceptable because: + +* No one has been able to identify a concrete use case requiring isolated PID + namespaces. +* The lack of use cases means we can't justify the complexity required to make + PID namespace type configurable. +* Users will already be looking for issues due to the major version upgrade and + prepared for a rollback to the previous release. + +Alternatively, we could create a flag in the kublet to disable shared PID +namespace, but this wouldn't be especially useful to users of a hosted +Kubernetes cluster. + + +[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]() + From f4fd0ffc88c634658cb324f9a95fb0ec6e2ebb7a Mon Sep 17 00:00:00 2001 From: Lee Verberne Date: Mon, 23 Jan 2017 17:09:46 -0800 Subject: [PATCH 4/4] Add rollback flag to shared PID rollout plan --- .../design-proposals/pod-pid-namespace.md | 43 +++++++++---------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/contributors/design-proposals/pod-pid-namespace.md b/contributors/design-proposals/pod-pid-namespace.md index f5c48e3f20b..43c38f22165 100644 --- a/contributors/design-proposals/pod-pid-namespace.md +++ b/contributors/design-proposals/pod-pid-namespace.md @@ -10,8 +10,8 @@ Runtime Interface and links its rollout in Docker to that of the CRI. ## Motivation -Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), -and enables: +Sharing a PID namespace between containers in a pod is discussed in +[#1615](https://issues.k8s.io/1615), and enables: 1. signaling between containers, which is useful for side cars (e.g. for signaling a daemon process after rotating logs). @@ -42,32 +42,31 @@ until after switching to the CRI. Other changes that must be made to support this change: -1. Ensure all containers restart if the infra container responsible for the - PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace - dies all containers sharing that namespace are killed as well.) +1. Add a test to verify all containers restart if the infra container + responsible for the PodSandbox dies. (Note: With Docker 1.12 if the source + of the PID namespace dies all containers sharing that namespace are killed + as well.) 2. Modify the Infra container used by the Docker runtime to reap orphaned zombies ([#36853](https://pr.k8s.io/36853)). ## Rollout Plan SIG Node is planning to switch to the CRI as a default in 1.6, at which point -users with Docker >= 1.12 will be able to test Shared namespaces. Switching -back to isolated PID namespaces will require disabling the CRI. - -At some point, say 1.7, SIG Node will remove support for disabling the CRI. -After this point users must roll back to a previous version of Kubernetes or -Docker to achieve PID namespace isolation. This is acceptable because: - -* No one has been able to identify a concrete use case requiring isolated PID - namespaces. -* The lack of use cases means we can't justify the complexity required to make - PID namespace type configurable. -* Users will already be looking for issues due to the major version upgrade and - prepared for a rollback to the previous release. - -Alternatively, we could create a flag in the kublet to disable shared PID -namespace, but this wouldn't be especially useful to users of a hosted -Kubernetes cluster. +users with Docker >= 1.12 will receive a shared PID namespace by default. +Cluster administrators will be able to disable this behavior by providing a flag +to the kubelet which will cause the dockershim to revert to previous behavior. + +The ability to disable shared PID namespaces is intended as a way to roll back +to prior behavior in the event of unforeseen problems. It won't be possible to +configure the behavior per-pod. We believe this is acceptable because: + +* We have not identified a concrete use case requiring isolated PID namespaces. +* Making PID namespace configurable requires changing the CRI, which we would + like to avoid since there are no use cases. + +In a future release, SIG Node will recommend docker >= 1.12. Unless a compelling +use case for isolated PID namespaces is discovered, we will remove the ability +to disable the shared PID namespace in the subsequent release. [1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/