diff --git a/config.md b/config.md index ec17fab94..fb33ff566 100644 --- a/config.md +++ b/config.md @@ -156,33 +156,48 @@ For POSIX platforms the `mounts` structure has the following fields: * **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2008's `environ`][ieee-1003.1-2008-xbd-c8.1]. * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2008 `execvp`'s *argv*][ieee-1003.1-2008-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. -* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. - Valid values are platform-specific. - For example, valid values for Linux are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. - Any value which cannot be mapped to a relevant kernel interface MUST cause an error. - `capabilities` contains the following properties: - * **`effective`** (array of strings, OPTIONAL) - the `effective` field is an array of effective capabilities that are kept for the process. - * **`bounding`** (array of strings, OPTIONAL) - the `bounding` field is an array of bounding capabilities that are kept for the process. - * **`inheritable`** (array of strings, OPTIONAL) - the `inheritable` field is an array of inheritable capabilities that are kept for the process. - * **`permitted`** (array of strings, OPTIONAL) - the `permitted` field is an array of permitted capabilities that are kept for the process. - * **`ambient`** (array of strings, OPTIONAL) - the `ambient` field is an array of ambient capabilities that are kept for the process. + +### POSIX process + +For systems that support POSIX rlimits (for example Linux and Solaris), the `process` object supports the following process-specific properties: + * **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure: - * **`type`** (string, REQUIRED) - the platform resource being limited, for example on Linux as defined in the [setrlimit(2)][setrlimit.2] man page. - * **`soft`** (uint64, REQUIRED) - the value of the limit enforced for the corresponding resource. - * **`hard`** (uint64, REQUIRED) - the ceiling for the soft limit that could be set by an unprivileged process. - Only a privileged process (e.g. under Linux: one with the CAP_SYS_RESOURCE capability) can raise a hard limit. + * **`type`** (string, REQUIRED) the platform resource being limited. + * Linux: valid values are defined in the [`getrlimit(2)`][getrlimit.2] man page, such as `RLIMIT_MSGQUEUE`. + * Solaris: valid values are defined in the [`getrlimit(3)`][getrlimit.3] man page, such as `RLIMIT_CORE`. - If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out. + The runtime MUST [generate an error](runtime.md#errors) for any values which cannot be mapped to a relevant kernel interface + For each entry in `rlimits`, a [`getrlimit(3)`][getrlimit.3] on `type` MUST succeed. + For the following properties, `rlim` refers to the status returned by the `getrlimit(3)` call. -* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges. - As an example, the ['no_new_privs'][no-new-privs] article in the kernel documentation has information on how this is achieved using a prctl system call on Linux. + * **`soft`** (uint64, REQUIRED) the value of the limit enforced for the corresponding resource. + `rlim.rlim_cur` MUST match the configured value. + * **`hard`** (uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process. + `rlim.rlim_max` MUST match the configured value. + Only a privileged process (e.g. one with the `CAP_SYS_RESOURCE` capability) can raise a hard limit. -For Linux-based systems the process structure supports the following process-specific fields. + If `rlimits` contains duplicated entries with same `type`, the runtime MUST [generate an error](runtime.md#errors). + +### Linux Process + +For Linux-based systems, the `process` object supports the following process-specific properties. * **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile for the process. For more information about AppArmor, see [AppArmor documentation][apparmor]. +* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. + Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. + Any value which cannot be mapped to a relevant kernel interface MUST cause an error. + `capabilities` contains the following properties: + + * **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process. + * **`bounding`** (array of strings, OPTIONAL) the `bounding` field is an array of bounding capabilities that are kept for the process. + * **`inheritable`** (array of strings, OPTIONAL) the `inheritable` field is an array of inheritable capabilities that are kept for the process. + * **`permitted`** (array of strings, OPTIONAL) the `permitted` field is an array of permitted capabilities that are kept for the process. + * **`ambient`** (array of strings, OPTIONAL) the `ambient` field is an array of ambient capabilities that are kept for the process. +* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges. + As an example, the [`no_new_privs`][no-new-privs] article in the kernel documentation has information on how this is achieved using a `prctl` system call on Linux. * **`oomScoreAdj`** *(int, OPTIONAL)* adjusts the oom-killer score in `[pid]/oom_score_adj` for the process's `[pid]` in a [proc pseudo-filesystem][procfs]. If `oomScoreAdj` is set, the runtime MUST set `oom_score_adj` to the given value. If `oomScoreAdj` is not set, the runtime MUST NOT change the value of `oom_score_adj`. @@ -838,7 +853,8 @@ Here is a full example `config.json` for reference. [mount.8]: http://man7.org/linux/man-pages/man8/mount.8.html [mount.8-filesystem-independent]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%20OPTIONS [mount.8-filesystem-specific]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-SPECIFIC_MOUNT%20OPTIONS -[setrlimit.2]: http://man7.org/linux/man-pages/man2/setrlimit.2.html +[getrlimit.2]: http://man7.org/linux/man-pages/man2/getrlimit.2.html +[getrlimit.3]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getrlimit.html [stdin.3]: http://man7.org/linux/man-pages/man3/stdin.3.html [uts-namespace.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html [zonecfg.1m]: http://docs.oracle.com/cd/E86824_01/html/E54764/zonecfg-1m.html diff --git a/specs-go/config.go b/specs-go/config.go index 413d46d07..c00f96ebc 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -45,7 +45,7 @@ type Process struct { // Capabilities are Linux capabilities that are kept for the process. Capabilities *LinuxCapabilities `json:"capabilities,omitempty" platform:"linux"` // Rlimits specifies rlimit options to apply to the process. - Rlimits []LinuxRlimit `json:"rlimits,omitempty" platform:"linux"` + Rlimits []POSIXRlimit `json:"rlimits,omitempty" platform:"linux,solaris"` // NoNewPrivileges controls whether additional privileges could be gained by processes in the container. NoNewPrivileges bool `json:"noNewPrivileges,omitempty" platform:"linux"` // ApparmorProfile specifies the apparmor profile for the container. @@ -202,8 +202,8 @@ type LinuxIDMapping struct { Size uint32 `json:"size"` } -// LinuxRlimit type and restrictions -type LinuxRlimit struct { +// POSIXRlimit type and restrictions +type POSIXRlimit struct { // Type of the rlimit to set Type string `json:"type"` // Hard is the hard limit for the specified type