|
| 1 | +# Shared PID Namespace |
| 2 | + |
| 3 | +Pods share namespaces where possible, but a requirement for sharing the PID |
| 4 | +namespace has not been defined due to lack of support in Docker. Docker began |
| 5 | +supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt, |
| 6 | +cri-o, hyper) have already implemented a shared PID namespace. |
| 7 | + |
| 8 | +This proposal defines a shared PID namespace as a requirement of the Container |
| 9 | +Runtime Interface and links its rollout in Docker to that of the CRI. |
| 10 | + |
| 11 | +## Motivation |
| 12 | + |
| 13 | +Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615), |
| 14 | +and enables: |
| 15 | + |
| 16 | + 1. signaling between containers, which is useful for side cars (e.g. for |
| 17 | + signaling a daemon process after rotating logs). |
| 18 | + 2. easier troubleshooting of pods. |
| 19 | + 3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the |
| 20 | + infra container. |
| 21 | + |
| 22 | +## Goals and Non-Goals |
| 23 | + |
| 24 | +Goals include: |
| 25 | + - Changing default behavior in the Docker runtime as implemented by the CRI |
| 26 | + - Making Docker behavior compatible with the other Kubernetes runtimes |
| 27 | + |
| 28 | +Non-goals include: |
| 29 | + - Creating an init solution that works for all runtimes |
| 30 | + - Supporting isolated PID namespace indefinitely |
| 31 | + |
| 32 | +## Modification to the Docker Runtime |
| 33 | + |
| 34 | +We will modify the Docker implementation of the CRI to use a shared PID |
| 35 | +namespace when running with a version of Docker >= 1.12. The legacy |
| 36 | +`dockertools` implementation will not be changed. |
| 37 | + |
| 38 | +Linking this change to the CRI means that Kubernetes users who care to test such |
| 39 | +changes can test the combined changes at once. Users who do not care to test |
| 40 | +such changes will be insulated by Kubernetes not recommending Docker >= 1.12 |
| 41 | +until after switching to the CRI. |
| 42 | + |
| 43 | +Other changes that must be made to support this change: |
| 44 | + |
| 45 | +1. Ensure all containers restart if the infra container responsible for the |
| 46 | + PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace |
| 47 | + dies all containers sharing that namespace are killed as well.) |
| 48 | +2. Modify the Infra container used by the Docker runtime to reap orphaned |
| 49 | + zombies ([#36853](https://pr.k8s.io/36853)). |
| 50 | + |
| 51 | +## Rollout Plan |
| 52 | + |
| 53 | +SIG Node is planning to switch to the CRI as a default in 1.6, at which point |
| 54 | +users with Docker >= 1.12 will be able to test Shared namespaces. Switching |
| 55 | +back to isolated PID namespaces will require disabling the CRI. |
| 56 | + |
| 57 | +At some point, say 1.7, SIG Node will remove support for disabling the CRI. |
| 58 | +After this point users must roll back to a previous version of Kubernetes or |
| 59 | +Docker to achieve PID namespace isolation. This is acceptable because: |
| 60 | + |
| 61 | +* No one has been able to identify a concrete use case requiring isolated PID |
| 62 | + namespaces. |
| 63 | +* The lack of use cases means we can't justify the complexity required to make |
| 64 | + PID namespace type configurable. |
| 65 | +* Users will already be looking for issues due to the major version upgrade and |
| 66 | + prepared for a rollback to the previous release. |
| 67 | + |
| 68 | +Alternatively, we could create a flag in the kublet to disable shared PID |
| 69 | +namespace, but this wouldn't be especially useful to users of a hosted |
| 70 | +Kubernetes cluster. |
| 71 | + |
| 72 | + |
| 73 | +[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ |
| 74 | + |
| 75 | + |
| 76 | +<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> |
| 77 | +[]() |
| 78 | +<!-- END MUNGE: GENERATED_ANALYTICS --> |
0 commit comments