Skip to content

Commit 7dfb614

Browse files
committed
Doc change: improve user-guide section on Sysbox limitations. [skip ci]
Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
1 parent d09ef0c commit 7dfb614

File tree

1 file changed

+52
-108
lines changed

1 file changed

+52
-108
lines changed

docs/user-guide/limitations.md

Lines changed: 52 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1,142 +1,86 @@
11
# Sysbox User Guide: Functional Limitations
22

33
This document describes functional restrictions and limitations of Sysbox and
4-
system containers.
4+
the containers created by it.
55

66
## Contents
77

8+
- [Sysbox Container Limitations](#sysbox-container-limitations)
89
- [Docker Restrictions](#docker-restrictions)
910
- [Kubernetes Restrictions](#kubernetes-restrictions)
10-
- [System Container Limitations](#system-container-limitations)
1111
- [Sysbox Functional Limitations](#sysbox-functional-limitations)
1212

13-
## Docker Restrictions
14-
15-
This section describes restrictions when launching containers with Docker +
16-
Sysbox.
13+
## Sysbox Container Limitations
1714

18-
### Support for Docker's `--privileged` Option
15+
Sysbox enables containers to run applications or system software such as
16+
systemd, Docker, Kubernetes, K3s, etc., seamlessly & securely (e.g., no
17+
privileged containers, no complex setups).
1918

20-
Sysbox system containers are incompatible with the Docker `--privileged` flag.
19+
While our goal is for Sysbox containers to run **any software that runs on
20+
bare-metal or VMs**, this is still a work-in-progress.
2121

22-
The raison d'être for Sysbox is to avoid the use of (very insecure) privileged
23-
containers yet enable users to run any type of software inside the container.
22+
Thus, there are some limitations at this time. The table below describes these.
2423

25-
Using the Docker `--privileged` + Sysbox will fail:
24+
| Limitation | Description | Affected Software | Planned Fix |
25+
| -------- | --------------- | ----- | :--: |
26+
| mknod | Fails with "operation not permitted". | Software that creates devices such as /dev/tun, /dev/tap, /dev/fuse, etc. | WIP |
27+
| binfmt-misc | Fails with "permission denied". | Software that uses /proc/sys/fs/binfmt_misc inside the container (e.g., buildx+QEMU for multi-arch builds). | WIP |
28+
| Nested user-namespace | `unshare -U --mount-proc` fails with "invalid argument". | Software that uses the Linux user-namespace (e.g., Docker + userns-remap). Note that the Sysbox container is rootless already, so this implies nesting Linux user-namespaces. | Yes |
29+
| Host device access | Host devices exposed to the container (e.g., `docker run --devices ...`) show up with "nobody:nogroup" ownership. Thus, access to them will fail with "permission denied" unless the device grants read/write permissions to "others". | Software that needs access to hardware accelerators. | Yes |
30+
| rpc-pipefs | Mounting rpc-pipefs fails with "permission denied". | Running an NFS server inside the Sysbox container. | Yes |
31+
| Host sockets | Host sockets exposed to the container (e.g., `docker run -v /host/socket:/mount/point ...`) don't always work as Sysbox mounts shiftfs on them. | Software that requires host socket access (e.g, VNC). | Yes |
32+
| insmod | Fails with "operation not permitted". | Can't load kernel modules from inside containers. | TBD |
2633

27-
```console
28-
$ docker run --runtime=sysbox-runc --privileged -it alpine
29-
docker: Error response from daemon: OCI runtime create failed: container_linux.go:364: starting container process caused "process_linux.go:533: container init caused \"rootfs_linux.go:67: setting up ptmx caused \\\"remove dev/ptmx: device or resource busy\\\"\"": unknown.
30-
ERRO[0000] error waiting for container: context canceled
31-
```
3234

33-
### Support for Docker's `--userns=host` Option
35+
**NOTES:**
3436

35-
When Docker is configured in userns-remap mode, Docker offers the ability
36-
to disable that mode on a per container basis via the `--userns=host`
37-
option in the `docker run` and `docker create` commands.
37+
* "WIP" means the fix is being worked-on right now. "TBD" means a
38+
decision is yet to be made.
3839

39-
This option **does not work** with Sysbox (i.e., don't use
40-
`docker run --runtime=sysbox-runc --userns=host ...`).
40+
* If you find other software that fails inside the Sysbox container, please open
41+
a GitHub issue so we can add it to the list and work on a fix.
4142

42-
Note that usage of this option is rare as it can lead to the problems as
43-
described [in this Docker article](https://docs.docker.com/engine/security/userns-remap/#disable-namespace-remapping-for-a-container).
43+
## Docker Restrictions
4444

45-
### Support for Docker's `--pid=host` and `--network=host` Options
45+
This section describes restrictions when using Docker + Sysbox.
4646

47-
System containers do not support sharing the pid or network namespaces
48-
with the host (as this is not secure and it's incompatible with the
49-
system container's user namespace).
47+
These restrictions are in place because they reduce or break container-to-host
48+
isolation, which is one of the key features of Sysbox.
5049

51-
For example, when using Docker to launch system containers, the
52-
`docker run --pid=host` and `docker run --network=host` options
53-
do not work with system containers.
50+
Note that some of these options (e.g., --privileged) are typically needed when
51+
running complex workloads in containers. With Sysbox, this is no longer needed.
5452

55-
### Support for Exposing Host Devices inside System Containers
53+
| Limitation | Description | Comment |
54+
| ---------- | --------------- | ------- |
55+
| docker --privileged | Does not work with Sysbox | Breaks container-to-host isolation. |
56+
| docker --userns=host | Does not work with Sysbox | Breaks container-to-host isolation. |
57+
| docker --pid=host | Does not work with Sysbox | Breaks container-to-host isolation. |
58+
| docker --net=host | Does not work with Sysbox | Breaks container-to-host isolation. |
5659

57-
Sysbox does not currently support exposing host devices inside system
58-
containers (e.g., via the `docker run --device` option).
5960

6061
## Kubernetes Restrictions
6162

62-
This section describes restrictions when launching containers with Kubernetes +
63-
Sysbox.
64-
65-
### Pods limited to 16 per-node on Sysbox-CE
66-
67-
Pods launched with the Sysbox Community Edition are **limited to \*\*16 pods per worker node\*\***.
68-
69-
Once this limit is reached, new pods scheduled on the node will remain in the
70-
"ContainerCreating" state. Such pods need to be terminated and re-created once
71-
there is sufficient capacity on the node.
72-
73-
#### ** --- Sysbox-EE Feature Highlight --- **
74-
75-
With Sysbox Enterprise (Sysbox-EE) this limitation is removed, as it's designed
76-
for greater scalability. Thus, you can launch as many pods as will fit on the
77-
Kubernetes node, allowing you to get the best utilization of the hardware.
78-
79-
Note that the number of pods that can be deployed on a node depends on many
80-
factors such as the number of CPUs on the node, the memory size on the node, the
81-
the amount of storage, the type of workloads running in the pods, resource
82-
limits on the pod, etc.)
83-
84-
### Privileged pods are not allowed
85-
86-
The pod's security context must not have the `privileged: true` attribute.
87-
88-
The raison d'être for Sysbox is to avoid the use of (very insecure) privileged
89-
containers yet enable users to run any type of software inside the container.
90-
91-
### Sharing Linux Namespaces with the Host is not allowed
63+
This section describes restrictions when using Kubernetes + Sysbox.
9264

93-
The pod's spec must not share Linux namespaces with the host, as this breaks
94-
container isolation. Thus avoid setting these in the pod's spec:
65+
Some of these restrictions are in place because they reduce or break
66+
container-to-host isolation, which is one of the key features of Sysbox.
9567

96-
```yaml
97-
hostNetwork: true
98-
hostIPC: true
99-
hostPID: true
100-
```
68+
Note that some of these options (e.g., privileged: true) are typically needed when
69+
running complex workloads in pods. With Sysbox, this is no longer needed.
10170

102-
## System Container Limitations
103-
104-
This section describes limitations for software running inside a system
105-
container.
106-
107-
### Creating User Namespaces inside a System Container
108-
109-
System containers do not currently support creating a user-namespace
110-
inside the system container and mounting procfs in it.
111-
112-
That is, executing the following instruction inside a system container
113-
is not supported:
114-
115-
unshare -U -i -m -n -p -u -f --mount-proc -r bash
116-
117-
The reason this is not yet supported is that Sysbox is not currently
118-
capable of ensuring that the procfs mounted inside the unshared
119-
namespace is the proper one. We expect to fix this soon.
71+
| Limitation | Description | Comment |
72+
| ---------- | --------------- | ------- |
73+
| 16 pods /node | Sysbox-CE is limited to 16 pods per node | Sysbox Enterprise removes this limitation. |
74+
| privileged: true | Not supported in pod security context | Breaks container-to-host isolation. |
75+
| hostNetwork: true | Not supported in pod security context | Breaks container-to-host isolation. |
76+
| hostIPC: true | Not supported in pod security context | Breaks container-to-host isolation. |
77+
| hostPID: true | Not supported in pod security context | Breaks container-to-host isolation. |
12078

12179
## Sysbox Functional Limitations
12280

123-
### Sysbox must run as root on the host
124-
125-
Sysbox must run with root privileges on the host system. It won't
126-
work if executed without root privileges.
127-
128-
Root privileges are necessary in order for Sysbox to interact with the Linux
129-
kernel in order to create the containers and perform many of the advanced
130-
functions it provides (e.g., procfs virtualization, sysfs virtualization, etc.)
131-
132-
### Checkpoint and Restore Support
133-
134-
Sysbox does not currently support checkpoint and restore of system containers.
135-
136-
### Sysbox Nesting
137-
138-
Sysbox must run at the host level (or within a privileged container if you must).
81+
| Limitation | Description | Planned Fix |
82+
| ---------- | --------------- | ------- |
83+
| Sysbox must run as root | Sysbox needs root privileges on the host to perform the advanced OS virtualization it provides (e.g., procfs/sysfs emualtion, syscall trappings, etc.) | TBD |
84+
| Container Checkpoint/Restore | Not yet supported | Yes |
85+
| Sysbox Nesting | Running Sysbox inside a Sysbox container is not supported | TBD |
13986

140-
Sysbox does not work when running inside a system container. This implies that
141-
we don't support running a system container inside a system container at this
142-
time.

0 commit comments

Comments
 (0)