Skip to content

Commit 550db9f

Browse files
committed
config: Make capabilities, noNewPrivileges, and rlimits Linux-only (again)
Roll back the genericization from 718f9f3 (minor narrative cleanup regarding config compatibility, 2017-01-30, opencontainers#673). Lifting the restriction there seems to have been motivated by "Solaris supports capabilities", but that was before the split into a capabilities object which happened in eb114f0 (Add ambient and bounding capability support, 2017-02-02, opencontainers#675). It's not clear if Solaris supports ambient caps, or what Solaris API rlimits or noNewPrivileges were punting to [1]. And John Howard has recently confirmed that Windows does not support capabilities and is unlikely to do so in the future [2]. John's statement didn't directly address rlimits or noNewPrivileges, but we can always restore any of these properties to the Solaris/Windows platforms if/when we get docs about which API we're punting to on those platforms. Also add some backticks, remove the hyphens in "OPTIONAL) - the", standardize lines I touch to use "the process" [3], and use four-space indents here to keep Pandoc happy (see 7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). [1]: opencontainers#673 (comment) [2]: opencontainers#810 (comment) [3]: opencontainers#809 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>
1 parent 3036273 commit 550db9f

File tree

1 file changed

+21
-18
lines changed

1 file changed

+21
-18
lines changed

config.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -130,35 +130,38 @@ For Solaris, the mount entry corresponds to the 'fs' resource in the [zonecfg(1M
130130
* **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2001's `environ`][ieee-1003.1-2001-xbd-c8.1].
131131
* **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2001 `execvp`'s *argv*][ieee-1003.1-2001-xsh-exec].
132132
This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*.
133-
* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process(es) inside the container. Valid values are platform-specific. For example, valid values for Linux are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. Any value which cannot be mapped to a relevant kernel interface MUST cause an error.
134-
capabilities contains the following properties:
135-
* **`effective`** (array of strings, OPTIONAL) - the `effective` field is an array of effective capabilities that are kept for the process.
136-
* **`bounding`** (array of strings, OPTIONAL) - the `bounding` field is an array of bounding capabilities that are kept for the process.
137-
* **`inheritable`** (array of strings, OPTIONAL) - the `inheritable` field is an array of inheritable capabilities that are kept for the process.
138-
* **`permitted`** (array of strings, OPTIONAL) - the `permitted` field is an array of permitted capabilities that are kept for the process.
139-
* **`ambient`** (array of strings, OPTIONAL) - the `ambient` field is an array of ambient capabilities that are kept for the process.
140-
* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for a process inside the container.
141-
Each entry has the following structure:
142-
143-
* **`type`** (string, REQUIRED) - the platform resource being limited, for example on Linux as defined in the [setrlimit(2)][setrlimit.2] man page.
144-
* **`soft`** (uint64, REQUIRED) - the value of the limit enforced for the corresponding resource.
145-
* **`hard`** (uint64, REQUIRED) - the ceiling for the soft limit that could be set by an unprivileged process. Only a privileged process (e.g. under Linux: one with the CAP_SYS_RESOURCE capability) can raise a hard limit.
146-
147-
If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out.
148-
149-
* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the processes in the container from gaining additional privileges.
150-
As an example, the ['no_new_privs'][no-new-privs] article in the kernel documentation has information on how this is achieved using a prctl system call on Linux.
151133

152134
For Linux-based systems the process structure supports the following process-specific fields.
153135

154136
* **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile to be applied to processes in the container.
155137
For more information about AppArmor, see [AppArmor documentation][apparmor].
138+
* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process.
139+
Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`.
140+
Any value which cannot be mapped to a relevant kernel interface MUST cause an error.
141+
`capabilities` contains the following properties:
142+
143+
* **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process.
144+
* **`bounding`** (array of strings, OPTIONAL) the `bounding` field is an array of bounding capabilities that are kept for the process.
145+
* **`inheritable`** (array of strings, OPTIONAL) the `inheritable` field is an array of inheritable capabilities that are kept for the process.
146+
* **`permitted`** (array of strings, OPTIONAL) the `permitted` field is an array of permitted capabilities that are kept for the process.
147+
* **`ambient`** (array of strings, OPTIONAL) the `ambient` field is an array of ambient capabilities that are kept for the process.
148+
* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges.
149+
As an example, the [`no_new_privs`][no-new-privs] article in the kernel documentation has information on how this is achieved using a `prctl` system call on Linux.
156150
* **`oomScoreAdj`** *(int, OPTIONAL)* adjusts the oom-killer score in `[pid]/oom_score_adj` for the container process's `[pid]` in a [proc pseudo-filesystem][procfs].
157151
If `oomScoreAdj` is set, the runtime MUST set `oom_score_adj` to the given value.
158152
If `oomScoreAdj` is not set, the runtime MUST NOT change the value of `oom_score_adj`.
159153

160154
This is a per-process setting, where as [`disableOOMKiller`](config-linux.md#disable-out-of-memory-killer) is scoped for a memory cgroup.
161155
For more information on how these two settings work together, see [the memory cgroup documentation section 10. OOM Contol][cgroup-v1-memory_2].
156+
* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process.
157+
Each entry has the following structure:
158+
159+
* **`type`** (string, REQUIRED) the platform resource being limited as defined in the [`setrlimit(2)`][setrlimit.2] man page.
160+
* **`soft`** (uint64, REQUIRED) the value of the limit enforced for the corresponding resource.
161+
* **`hard`** (uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process.
162+
Only a privileged process (e.g. one with the `CAP_SYS_RESOURCE` capability) can raise a hard limit.
163+
164+
If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out.
162165
* **`selinuxLabel`** (string, OPTIONAL) specifies the SELinux label to be applied to the processes in the container.
163166
For more information about SELinux, see [SELinux documentation][selinux].
164167

0 commit comments

Comments
 (0)