Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create user namespaced container without network namespaces #799

Open
cyphar opened this issue May 3, 2016 · 13 comments
Open

Cannot create user namespaced container without network namespaces #799

cyphar opened this issue May 3, 2016 · 13 comments

Comments

@cyphar
Copy link
Member

cyphar commented May 3, 2016

I discovered this while working on rootless containers. It looks like there's some issues using a non-network namespaced setup. This is also blocking rootless containers from having networking (since we need to just use host networking).

% sudo runc start test
rootfs_linux.go:53: mounting "/sys" to rootfs "/home/cyphar/src/runc/rootfs" caused "operation not permitted"

Here's the config, but the important thing to note is that I've added some dummy user namespace setup and removed the network section from namespaces.

{
    "ociVersion": "0.6.0-dev",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "process": {
        "terminal": true,
        "user": {},
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": [
            "CAP_AUDIT_WRITE",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "noNewPrivileges": true
    },
    "root": {
        "path": "rootfs",
        "readonly": true
    },
    "hostname": "runc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620",
                "gid=5"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "hooks": {},
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ]
        },
        "uidMappings": [
            {
                "hostID": 1000,
                "containerID": 0,
                "size": 100
            }
        ],
        "gidMappings": [
            {
                "hostID": 1000,
                "containerID": 0,
                "size": 100
            }
        ],
        "namespaces": [
            {
                "type": "user"
            },
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            }
        ],
        "maskedPaths": [
            "/proc/kcore",
            "/proc/latency_stats",
            "/proc/timer_stats",
            "/proc/sched_debug"
        ],
        "readonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
        ]
    }
}

Blocking #774.

@cyphar cyphar mentioned this issue May 3, 2016
46 tasks
@dqminh
Copy link
Contributor

dqminh commented May 3, 2016

This is a currently known restriction in the kernel that you cant mount sys without CAP_SYS_ADMIN rights. Removing sysfs mounting should allow you to start the container

I think the patch note is here:

Also discussed a bit in moby/moby#21800

@cyphar
Copy link
Member Author

cyphar commented May 3, 2016

@dqminh But we're using user namespaces, so we have CAP_SYS_ADMIN in the namespace. If you add the network namespace to the config, it works perfectly fine. I think it's more nuanced problem (possibly how we're messing around with mount options in rootfs_linux).

@dqminh
Copy link
Contributor

dqminh commented May 3, 2016

But we're using user namespaces, so we have CAP_SYS_ADMIN in the namespace

That's not quite true I think. You only have CAP_SYS_ADMIN in net namespace created by the user, not when you join net namespace of the host.

@cyphar
Copy link
Member Author

cyphar commented May 3, 2016

Ah, you meant the user namespace that "owns" the net namespace. Okay, if that's the requirement for mounting all of /sys (which seems odd), we'll have to not mount sysfs. We should probably add this to the validator, so people don't run into this by accident.

I've removed sysfs from my config and that appears to work now. Unfortunately, it looks like I still don't have network access for some reason ...

/cc @davidlt

@dqminh
Copy link
Contributor

dqminh commented May 3, 2016

Unfortunately, it looks like I still don't have network access for some reason ...

Hmm it should work ( at least when i tested this a few weeks ago :p ). What did you use to test network access ? ping or anything that uses CAP_NET_* will not work though.

@cyphar
Copy link
Member Author

cyphar commented May 3, 2016

I was just using netcat. I've had enough bad experiences with capabilities to know better than trust ping in containers. ;)

@davidlt
Copy link

davidlt commented May 3, 2016

Seems to work, at least yum makecache worked, but I am facing issues trying to install anything useful in the container, e.g.

Running transaction
  Installing : fipscheck-lib-1.4.1-5.el7.x86_64                                                                                                                                                                                                                             1/3
Error unpacking rpm package fipscheck-lib-1.4.1-5.el7.x86_64
error: unpacking of archive failed on file /usr/lib64/libfipscheck.so.1;5728b733: cpio: symlink
  Installing : fipscheck-1.4.1-5.el7.x86_64                                                                                                                                                                                                                                 2/3
Error unpacking rpm package fipscheck-1.4.1-5.el7.x86_64
error: fipscheck-lib-1.4.1-5.el7.x86_64: install failed
error: unpacking of archive failed on file /usr/bin/fipscheck;5728b733: cpio: open
error: fipscheck-1.4.1-5.el7.x86_64: install failed
groupadd: cannot open /etc/gshadow
  Installing : openssh-6.6.1p1-25.el7_2.x86_64                                                                                                                                                                                                                              3/3
Error unpacking rpm package openssh-6.6.1p1-25.el7_2.x86_64
error: unpacking of archive failed on file /usr/bin/ssh-keygen;5728b733: cpio: open

I guess, I have to built an image with e.g. Docker and include wanted packages.

@davidlt
Copy link

davidlt commented May 3, 2016

Here is a better proof that it works. Is there a way to map /etc/resolv.conf from the host to the container?

[davidlt@pccms205 test2]$ cat /etc/redhat-release
Fedora release 24 (Twenty Four)
[davidlt@pccms205 test2]$ runc --root $PWD start test_cont
sh-4.2# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
sh-4.2# dig google.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55412
;; flags: qr rd ra; QUERY: 1, ANSWER: 15, AUTHORITY: 4, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             195     IN      A       195.112.88.178
google.com.             195     IN      A       195.112.88.177
google.com.             195     IN      A       195.112.88.185
google.com.             195     IN      A       195.112.88.179
google.com.             195     IN      A       195.112.88.184
google.com.             195     IN      A       195.112.88.180
google.com.             195     IN      A       195.112.88.187
google.com.             195     IN      A       195.112.88.181
google.com.             195     IN      A       195.112.88.188
google.com.             195     IN      A       195.112.88.189
google.com.             195     IN      A       195.112.88.183
google.com.             195     IN      A       195.112.88.175
google.com.             195     IN      A       195.112.88.176
google.com.             195     IN      A       195.112.88.182
google.com.             195     IN      A       195.112.88.186

;; AUTHORITY SECTION:
google.com.             59409   IN      NS      ns2.google.com.
google.com.             59409   IN      NS      ns4.google.com.
google.com.             59409   IN      NS      ns1.google.com.
google.com.             59409   IN      NS      ns3.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.         37866   IN      A       216.239.32.10
ns2.google.com.         72394   IN      A       216.239.34.10
ns3.google.com.         35936   IN      A       216.239.36.10
ns4.google.com.         56592   IN      A       216.239.38.10

;; Query time: 1 msec
;; SERVER: 137.138.17.5#53(137.138.17.5)
;; WHEN: Tue May 03 15:27:22 UTC 2016
;; MSG SIZE  rcvd: 415

@cyphar
Copy link
Member Author

cyphar commented May 3, 2016

You can try bindmounting the file. You'd have to create the file in the rootfs of your container (manually), then adding a bind option for it in config.json. You could also use pre-start hooks if you really wanted to just copy the file (but that would make it go out of sync).

@wking
Copy link
Contributor

wking commented May 3, 2016

The difficulty with unpriviledged net namespaces is with connecting
them to the outside world:

$ unshare -nUfr sh
sh-4.3# ip route
sh-4.3# ip addr
1: lo: mtu 65536 qdisc noop state DOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0@NONE: mtu 1480 qdisc noop state DOWN group default
link/sit 0.0.0.0 brd 0.0.0.0

To setup that connection, you need someone with priviledged access in
the runtime namespace 1 to setup a bridge and throw one half of a
veth connection over the wall (e.g. 2), or setup iptable rules,
etc., etc. to connect the runtime net namespace with the container net
namespace.

In the absence of such a cooperative privileged user, you can still
use unprivileged net namespaces for isolated network tests (and you
can probably setup subcontainers and have the unprivileged user setup
bridging between those subcontainers).

@mrunalp
Copy link
Contributor

mrunalp commented May 4, 2016

Yeah, need a privileged helper for setting up veth pair to host bridge. lxc also uses a privileged helper to setup networking for unprivileged containers called lxc-user-nic.

@cyphar
Copy link
Member Author

cyphar commented May 8, 2016

#807 adds a check to the validator to make sure that a user doesn't end up in this case.

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017
*: Use inline links for remaining internal references
@nzhang-zh
Copy link

nzhang-zh commented Jan 18, 2019

Ran into a similar issue when runc is given a network namespace file.

However it runs fine if either namespace file path or user namespace is removed from config.json.

Is there a work around to use network namespace created in host namespace?

$ sudo runc run hello
container_linux.go:344: starting container process caused "process_linux.go:424: container init caused "rootfs_linux.go:58: mounting "sysfs" to rootfs "/tmp/hello-world/rootfs" at "/sys" caused "operation not permitted"""
$ jq '.' config.json
{
  "ociVersion": "1.0.1-dev",
  "process": {
    "terminal": false,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "/hello"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "effective": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "inheritable": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "permitted": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "ambient": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ]
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "hostname": "runc",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620",
        "gid=5"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "mode=1777",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "relatime",
        "ro"
      ]
    }
  ],
  "linux": {
    "uidMappings": [
      {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
      }
    ],
    "gidMappings": [
      {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
      }
    ],
    "resources": {
      "devices": [
        {
          "allow": false,
          "access": "rwm"
        }
      ]
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network",
        "path": "/var/run/netns/ns1"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "user"
      }
    ],
    "maskedPaths": [
      "/proc/kcore",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug",
      "/sys/firmware",
      "/proc/scsi"
    ],
    "readonlyPaths": [
      "/proc/asound",
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants