Skip to content

Commit 7232e4b

Browse files
author
Brandon Philips
committed
specs: introduce the concept of a runtime.json
Based on our discussion in-person yesterday it seems necessary to separate the concept of runtime configuration from application configuration. There are a few motivators: - To support runtime updates of things like cgroups, rlimits, etc we should separate things that are inherently runtime specific from things that are static to the application running in the container. - To support the goal of being able to move a bundle between hosts we should make it clear what parts of the spec are and are not portable between hosts so that upon landing on a new host the non-portable options may be rewritten or removed. - In order to attach a cryptographic identity to a bundle we must not include details in the bundle that are host specific.
1 parent 9ad789f commit 7232e4b

File tree

9 files changed

+356
-326
lines changed

9 files changed

+356
-326
lines changed

bundle.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,19 @@ A standard container bundle is made of the following 3 parts:
1212

1313
# Directory layout
1414

15-
A Standard Container bundle is a directory containing all the content needed to load and run a container. This includes its configuration file (`config.json`) and content directories. The main property of this directory layout is that it can be moved as a unit to another machine and run the same container.
15+
A Standard Container bundle is a directory containing all the content needed to load and run a container.
16+
This includes two configuration files `config.json` and `runtime.json`, and a rootfs directory.
17+
The `config.json` file contains settings that are host independent and application specific such as security permissions, environment variables and arguments.
18+
The `runtime.json` file contains settings that are host specific such as memory limits, local device access and mount points.
19+
The goal is that the bundle can be moved as a unit to another machine and run the same application if `runtime.json` is removed or reconfigured.
1620

1721
The syntax and semantics for `config.json` are described in [this specification](config.md).
1822

19-
One or more *content directories* may be adjacent to the configuration file. This must include at least the root filesystem (referenced in the configuration file by the *root* field) and may include other related content (signatures, other configs, etc.). The interpretation of these resources is specified in the configuration. The names of the directories may be arbitrary, but users should consider using conventional names as in the example below.
23+
A single `rootfs` directory MUST be in the same directory as the `config.json`.
24+
The names of the directories may be arbitrary, but users should consider using conventional names as in the example below.
2025

2126
```
22-
/
23-
!
24-
--- config.json
25-
!
26-
--- rootfs
27-
!
28-
--- signatures
27+
config.json
28+
runtime.json
29+
rootfs/
2930
```
30-

config-linux.md

Lines changed: 9 additions & 203 deletions
Original file line numberDiff line numberDiff line change
@@ -5,142 +5,7 @@ cgroups, capabilities, LSM, and file system jails to fulfill the spec.
55
Additional information is needed for Linux over the [default spec configuration](config.md)
66
in order to configure these various kernel features.
77

8-
## Linux namespaces
9-
10-
A namespace wraps a global system resource in an abstraction that makes it
11-
appear to the processes within the namespace that they have their own isolated
12-
instance of the global resource. Changes to the global resource are visible to
13-
other processes that are members of the namespace, but are invisible to other
14-
processes. For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html)
15-
16-
Namespaces are specified in the spec as an array of entries. Each entry has a
17-
type field with possible values described below and an optional path element.
18-
If a path is specified, that particular file is used to join that type of namespace.
19-
20-
```json
21-
"namespaces": [
22-
{
23-
"type": "pid",
24-
"path": "/proc/1234/ns/pid"
25-
},
26-
{
27-
"type": "net",
28-
"path": "/var/run/netns/neta"
29-
},
30-
{
31-
"type": "mnt",
32-
},
33-
{
34-
"type": "ipc",
35-
},
36-
{
37-
"type": "uts",
38-
},
39-
{
40-
"type": "user",
41-
},
42-
]
43-
```
44-
45-
#### Namespace types
46-
47-
* **pid** processes inside the container will only be able to see other processes inside the same container.
48-
* **network** the container will have its own network stack.
49-
* **mnt** the container will have an isolated mount table.
50-
* **ipc** processes inside the container will only be able to communicate to other processes inside the same
51-
container via system level IPC.
52-
* **uts** the container will be able to have its own hostname and domain name.
53-
* **user** the container will be able to remap user and group IDs from the host to local users and groups
54-
within the container.
55-
56-
### Access to devices
57-
58-
Devices is an array specifying the list of devices to be created in the container.
59-
Next parameters can be specified:
60-
61-
* type - type of device: 'c', 'b', 'u' or 'p'. More info in `man mknod`
62-
* path - full path to device inside container
63-
* major, minor - major, minor numbers for device. More info in `man mknod`.
64-
There is special value: `-1`, which means `*` for `device`
65-
cgroup setup.
66-
* permissions - cgroup permissions for device. A composition of 'r'
67-
(read), 'w' (write), and 'm' (mknod).
68-
* fileMode - file mode for device file
69-
* uid - uid of device owner
70-
* gid - gid of device owner
71-
72-
```json
73-
"devices": [
74-
{
75-
"path": "/dev/random",
76-
"type": "c",
77-
"major": 1,
78-
"minor": 8,
79-
"permissions": "rwm",
80-
"fileMode": 0666,
81-
"uid": 0,
82-
"gid": 0
83-
},
84-
{
85-
"path": "/dev/urandom",
86-
"type": "c",
87-
"major": 1,
88-
"minor": 9,
89-
"permissions": "rwm",
90-
"fileMode": 0666,
91-
"uid": 0,
92-
"gid": 0
93-
},
94-
{
95-
"path": "/dev/null",
96-
"type": "c",
97-
"major": 1,
98-
"minor": 3,
99-
"permissions": "rwm",
100-
"fileMode": 0666,
101-
"uid": 0,
102-
"gid": 0
103-
},
104-
{
105-
"path": "/dev/zero",
106-
"type": "c",
107-
"major": 1,
108-
"minor": 5,
109-
"permissions": "rwm",
110-
"fileMode": 0666,
111-
"uid": 0,
112-
"gid": 0
113-
},
114-
{
115-
"path": "/dev/tty",
116-
"type": "c",
117-
"major": 5,
118-
"minor": 0,
119-
"permissions": "rwm",
120-
"fileMode": 0666,
121-
"uid": 0,
122-
"gid": 0
123-
},
124-
{
125-
"path": "/dev/full",
126-
"type": "c",
127-
"major": 1,
128-
"minor": 7,
129-
"permissions": "rwm",
130-
"fileMode": 0666,
131-
"uid": 0,
132-
"gid": 0
133-
}
134-
]
135-
```
136-
137-
## Linux control groups
138-
139-
Also known as cgroups, they are used to restrict resource usage for a container and handle
140-
device access. cgroups provide controls to restrict cpu, memory, IO, and network for
141-
the container. For more information, see the [kernel cgroups documentation](https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt)
142-
143-
## Linux capabilities
8+
## Capabilities
1449

14510
Capabilities is an array that specifies Linux capabilities that can be provided to the process
14611
inside the container. Valid values are the string after `CAP_` for capabilities defined
@@ -154,33 +19,15 @@ in [the man page](http://man7.org/linux/man-pages/man7/capabilities.7.html)
15419
]
15520
```
15621

157-
## Linux sysctl
158-
159-
sysctl allows kernel parameters to be modified at runtime for the container.
160-
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
161-
162-
```json
163-
"sysctl": {
164-
"net.ipv4.ip_forward": "1",
165-
"net.core.somaxconn": "256"
166-
}
167-
```
22+
## Rootfs Mount Propagation
16823

169-
## Linux rlimits
24+
rootfsPropagation sets the rootfs's mount propagation. Its value is either slave, private, or shared. [The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
17025

17126
```json
172-
"rlimits": [
173-
{
174-
"type": "RLIMIT_NPROC",
175-
"soft": 1024,
176-
"hard": 102400
177-
}
178-
]
27+
"rootfsPropagation": "slave",
17928
```
18029

181-
rlimits allow setting resource limits. The type is from the values defined in [the man page](http://man7.org/linux/man-pages/man2/setrlimit.2.html). The kernel enforces the soft limit for a resource while the hard limit acts as a ceiling for that value that could be set by an unprivileged process.
182-
183-
## Linux user namespace mappings
30+
## User namespace mappings
18431

18532
```json
18633
"uidMappings": [
@@ -199,48 +46,7 @@ rlimits allow setting resource limits. The type is from the values defined in [t
19946
]
20047
```
20148

202-
uid/gid mappings describe the user namespace mappings from the host to the container. *hostID* is the starting uid/gid on the host to be mapped to *containerID* which is the starting uid/gid in the container and *size* refers to the number of ids to be mapped. The Linux kernel has a limit of 5 such mappings that can be specified.
203-
204-
## Rootfs Mount Propagation
205-
rootfsPropagation sets the rootfs's mount propagation. Its value is either slave, private, or shared. [The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
206-
207-
```json
208-
"rootfsPropagation": "slave",
209-
```
210-
211-
## Selinux process label
212-
213-
Selinux process label specifies the label with which the processes in a container are run.
214-
For more information about SELinux, see [Selinux documentation](http://selinuxproject.org/page/Main_Page)
215-
```json
216-
"selinuxProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675"
217-
```
218-
219-
## Apparmor profile
220-
221-
Apparmor profile specifies the name of the apparmor profile that will be used for the container.
222-
For more information about Apparmor, see [Apparmor documentation](https://wiki.ubuntu.com/AppArmor)
223-
224-
```json
225-
"apparmorProfile": "acme_secure_profile"
226-
```
227-
228-
## Seccomp
229-
230-
Seccomp provides application sandboxing mechanism in the Linux kernel.
231-
Seccomp configuration allows one to configure actions to take for matched syscalls and furthermore also allows
232-
matching on values passed as arguments to syscalls.
233-
For more information about Seccomp, see [Seccomp kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt)
234-
The actions and operators are strings that match the definitions in seccomp.h from [libseccomp](https://github.com/seccomp/libseccomp) and are translated to corresponding values.
235-
236-
```json
237-
"seccomp": {
238-
"defaultAction": "SCMP_ACT_ALLOW",
239-
"syscalls": [
240-
{
241-
"name": "getcwd",
242-
"action": "SCMP_ACT_ERRNO"
243-
}
244-
]
245-
}
246-
```
49+
uid/gid mappings describe the user namespace mappings from the host to the container.
50+
The mappings represent how the bundle `rootfs` expects the user namespace to be setup and the runtime SHOULD NOT modify the permissions on the rootfs to realize the mapping.
51+
*hostID* is the starting uid/gid on the host to be mapped to *containerID* which is the starting uid/gid in the container and *size* refers to the number of ids to be mapped.
52+
There is a limit of 5 mappings which is the Linux kernel hard limit.

spec.go renamed to config.go

Lines changed: 7 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,30 +14,7 @@ type Spec struct {
1414
// Hostname is the container's host name.
1515
Hostname string `json:"hostname"`
1616
// Mounts profile configuration for adding mounts to the container's filesystem.
17-
Mounts []Mount `json:"mounts"`
18-
// Hooks are the commands run at various lifecycle events of the container.
19-
Hooks Hooks `json:"hooks"`
20-
}
21-
22-
type Hooks struct {
23-
// Prestart is a list of hooks to be run before the container process is executed.
24-
// On Linux, they are run after the container namespaces are created.
25-
Prestart []Hook `json:"prestart"`
26-
// Poststop is a list of hooks to be run after the container process exits.
27-
Poststop []Hook `json:"poststop"`
28-
}
29-
30-
// Mount specifies a mount for a container.
31-
type Mount struct {
32-
// Type specifies the mount kind.
33-
Type string `json:"type"`
34-
// Source specifies the source path of the mount. In the case of bind mounts on
35-
// linux based systems this would be the file on the host.
36-
Source string `json:"source"`
37-
// Destination is the path where the mount will be placed relative to the container's root.
38-
Destination string `json:"destination"`
39-
// Options are fstab style mount options.
40-
Options string `json:"options"`
17+
MountPoints []MountPoint `json:"mounts"`
4118
}
4219

4320
// Process contains information to start a specific application inside the container.
@@ -72,9 +49,10 @@ type Platform struct {
7249
Arch string `json:"arch"`
7350
}
7451

75-
// Hook specifies a command that is run at a particular event in the lifecycle of a container.
76-
type Hook struct {
77-
Path string `json:"path"`
78-
Args []string `json:"args"`
79-
Env []string `json:"env"`
52+
// MountPoint describes a directory that may be fullfilled by a mount in the runtime.json.
53+
type MountPoint struct {
54+
// Name is a unique descriptive identifier for this mount point.
55+
Name string `json:"name"`
56+
// Path specifies the path of the mount. The path and child directories MUST exist, a runtime MUST NOT create directories automatically to a mount point.
57+
Path string `json:"path"`
8058
}

config.md

Lines changed: 1 addition & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Configuration file
22

3-
The containers top-level directory MUST contain a configuration file called `config.json`.
3+
The container's top-level directory MUST contain a configuration file called `config.json`.
44
For now the canonical schema is defined in [spec.go](spec.go) and [spec_linux.go](spec_linux.go), but this will be moved to a formal JSON schema over time.
55

66
The configuration file contains metadata necessary to implement standard operations against the container.
@@ -34,61 +34,6 @@ Each container has exactly one *root filesystem*, specified in the *root* object
3434
}
3535
```
3636

37-
## Mount Configuration
38-
39-
Additional filesystems can be declared as "mounts", specified in the *mounts* array. The parameters are similar to the ones in Linux mount system call. [http://linux.die.net/man/2/mount](http://linux.die.net/man/2/mount)
40-
41-
* **type** (string, required) Linux, *filesystemtype* argument supported by the kernel are listed in */proc/filesystems* (e.g., "minix", "ext2", "ext3", "jfs", "xfs", "reiserfs", "msdos", "proc", "nfs", "iso9660"). Windows: ntfs
42-
* **source** (string, required) a device name, but can also be a directory name or a dummy. Windows, the volume name that is the target of the mount point. \\?\Volume\{GUID}\ (on Windows source is called target)
43-
* **destination** (string, required) where the source filesystem is mounted relative to the container rootfs.
44-
* **options** (string, optional) in the fstab format [https://wiki.archlinux.org/index.php/Fstab](https://wiki.archlinux.org/index.php/Fstab).
45-
46-
*Example (Linux)*
47-
48-
```json
49-
"mounts": [
50-
{
51-
"type": "proc",
52-
"source": "proc",
53-
"destination": "/proc",
54-
"options": ""
55-
},
56-
{
57-
"type": "tmpfs",
58-
"source": "tmpfs",
59-
"destination": "/dev",
60-
"options": "nosuid,strictatime,mode=755,size=65536k"
61-
},
62-
{
63-
"type": "devpts",
64-
"source": "devpts",
65-
"destination": "/dev/pts",
66-
"options": "nosuid,noexec,newinstance,ptmxmode=0666,mode=0620,gid=5"
67-
},
68-
{
69-
"type": "bind",
70-
"source": "/volumes/testing",
71-
"destination": "/data",
72-
"options": "rbind,rw"
73-
}
74-
]
75-
```
76-
77-
*Example (Windows)*
78-
79-
```json
80-
"mounts": [
81-
{
82-
"type": "ntfs",
83-
"source": "\\\\?\\Volume\\{2eca078d-5cbc-43d3-aff8-7e8511f60d0e}\\",
84-
"destination": "C:\\Users\\crosbymichael\\My Fancy Mount Point\\",
85-
"options": ""
86-
}
87-
]
88-
```
89-
90-
See links for details about [mountvol](http://ss64.com/nt/mountvol.html) and [SetVolumeMountPoint](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365561(v=vs.85).aspx) in Windows.
91-
9237
## Process configuration
9338

9439
* **terminal** (bool, optional) specifies whether you want a terminal attached to that process. Defaults to false.

0 commit comments

Comments
 (0)