Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/main/tools/linux-sandbox-pid1.cc:393: "mount": Operation not permitted #1972

Closed
brian-peloton opened this issue Oct 20, 2016 · 20 comments
Closed
Assignees
Labels
category: sandboxing P2 We'll consider working on this in future. (Assignee optional) type: bug
Milestone

Comments

@brian-peloton
Copy link
Contributor

When trying to build anything with the new sandbox and Debian Jessie's amd64 default 3.16.0-4 kernel, it fails with src/main/tools/linux-sandbox-pid1.cc:393: "mount": Operation not permitted. @philsc and I have previously looked for ways to make /proc show the right PIDs in a PID namespace on that kernel without root permission and not come up with anything.

I don't have any good answers in the way of solutions. asan definitely does not do well with a broken /proc (that's what @philsc and I were working on previously, although we ran into other, more fundamental issues and gave up), and from what I've seen of java it won't either. However, having a PID namespace is really nice for preventing runaway processes (I periodically have to use pgrep and manually kill runaway test process with the old sandbox).

These commands show the same issue with that kernel:

brian[907] dev-builder ~:
$ unshare --mount --map-root-user --pid --fork
root[857] dev-builder ~:
# mount -t proc proc /proc
mount: permission denied
root[857] dev-builder ~:

Those same commands succeed with 4.3.0-0 kernel from jessie-backports, so I'm pretty sure Bazel's sandbox will too (haven't checked though):

brian[17107] brian-debian ~:
$ unshare --mount --map-root-user --pid --fork
root[501] brian-debian ~:
# mount -t proc proc /proc
root[501] brian-debian ~:

/cc @philwo

@philwo
Copy link
Member

philwo commented Oct 26, 2016

Interesting! I'll try to reproduce this and see if I can come up with a solution somehow, but I probably won't have time for it this week (and I'm on vacation next week). :(

We have noticed reliability issues with the default kernel of Ubuntu 14.04 LTS, which I think is 3.13, as well. The issue is probably not the same issue, as we could never reproduce it on demand (but it seemed like somehow the system got stuck into a state where sandboxing from then on would fail and only a reboot would make it work again). But with the newer 4.x kernel available from the official Ubuntu repo, I never saw these or other issues with the sandbox.

@philwo philwo self-assigned this Oct 26, 2016
@philwo philwo added type: bug P2 We'll consider working on this in future. (Assignee optional) category: sandboxing labels Oct 26, 2016
@davido
Copy link
Contributor

davido commented Nov 5, 2016

I'm seeing the same issue on this Docker image: https://hub.docker.com/r/gerritforge/gerrit-ci-slave-bazel. I'm using openSUSE 42.

To reproduce:

$ docker run -ti --entrypoint=/bin/bash gerritforge/gerrit-ci-slave-bazel
$ su - jenkins
$ git clone --recursive https://gerrit.googlesource.com/gerrit
$ bazel build gerrit
INFO: Found 1 target...
ERROR: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/external/jsonevent_layout/jar/BUILD:2:1: Extracting interface @jsonevent_layout//jar:jar failed: linux-sandbox failed: error executing command /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit/_bin/linux-sandbox ... (remaining 5 argument(s) skipped).
src/main/tools/linux-sandbox-pid1.cc:393: "mount": Operation not permitted
Target //:gerrit failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 23.554s, Critical Path: 0.88s

@davido
Copy link
Contributor

davido commented Nov 5, 2016

Upgrading to Bazel 0.4.0 didn't help either. Here is log with debug sanbdox option enabled: [1].

Environment:

$ bazel info       
bazel-bin: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit/bazel-out/local-fastbuild/bin
bazel-genfiles: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit/bazel-out/local-fastbuild/genfiles
bazel-testlogs: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit/bazel-out/local-fastbuild/testlogs
command_log: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/command.log
committed-heap-size: 990MB
execution_root: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit
gc-count: 9
gc-time: 259ms
install_base: /home/jenkins/.cache/bazel/_bazel_jenkins/install/0cc4b236e213b245b1e75e931bb2c011
max-heap-size: 7398MB
message_log: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/message.log
output_base: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982
output_path: /home/jenkins/.cache/bazel/_bazel_jenkins/4e8644684552b40c50dc624b79e09982/execroot/gerrit/bazel-out
package_path: %workspace%
release: release 0.4.0
server_pid: 1270
used-heap-size: 606MB
workspace: /home/jenkins/projects/gerrit

jenkins@68fab8fdcf00:~/projects/gerrit$ uname -a
Linux 68fab8fdcf00 4.1.34-33-default NVIDIA/nvidia-docker#1 SMP PREEMPT Thu Oct 20 08:03:29 UTC 2016 (fe18aba) x86_64 x86_64 x86_64 GNU/Linux

@philwo
Copy link
Member

philwo commented Dec 9, 2016

I'll try to reproduce & fix this, but currently I have no idea what could cause the mounting of /proc to fail. :(

@davido
Copy link
Contributor

davido commented Dec 10, 2016

We were able to fix the problem by starting Docker vm with some options.

@faithseed
Copy link

faithseed commented Dec 13, 2016

I ran into the same problem. and it looks like a kernel compatibility issue. apt-get dist-upgrade (on ubuntu 14.04) fixed the problem.

3.16.0-77-generic NVIDIA/nvidia-docker#99~14.04.1-Ubuntu failed
4.4.0-53-generic NVIDIA/nvidia-docker#74~14.04.1-Ubuntu works

@brian-peloton
Copy link
Contributor Author

I'm pretty sure it's a kernel version-related issue too.

@davido: What options made it work? Also, what kernel are you using?

@davido
Copy link
Contributor

davido commented Dec 14, 2016

It was --priviledged: [1].

Kernel here is:

$ uname -a
Linux linux-ucwl.site 4.1.34-33-default NVIDIA/nvidia-docker#1 SMP PREEMPT Thu Oct 20 08:03:29 UTC 2016 (fe18aba) x86_64 x86_64 x86_64 GNU/Linux

@mratsim
Copy link

mratsim commented Jan 24, 2017

Seeing the same error in a Archlinux LXC container running on Proxmox (Debian Jessie kernel)

$ uname -a
Linux machinelearning 4.4.35-2-pve NVIDIA/nvidia-docker#1 SMP Mon Jan 9 10:21:44 CET 2017 x86_64 GNU/Linux

--- Build logs
Build successful! Binary is here: /pkg/makepkg/bazel/src/output/bazel
Extracting Bazel installation...
......
INFO: Found 1 target...
ERROR: /pkg/makepkg/bazel/src/src/main/native/BUILD:1:1: Executing genrule //src/main/native:copy_link_jni_md_header failed: linux-sandbox failed: error executing command /home/ml/.cache/bazel/_bazel_ml/6ae2aecfa6ff1003adffee270b604ad9/execroot/src/_bin/linux-sandbox ... (remaining 5 argument(s) skipped).
src/main/tools/linux-sandbox-pid1.cc:88: "mount": Permission denied
Target //scripts:bazel-complete.bash failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.972s, Critical Path: 0.14s

Edit: Proxmox ISOs are here: https://www.proxmox.com/en/downloads

@nornagon
Copy link

Also seeing this error under CircleCI's docker containers:

Within the CircleCI container:

(venv-3.4.3) ubuntu@box260:~/code$ uname -a
Linux box260.localdomain 3.13.0-106-generic NVIDIA/nvidia-docker#153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Console output:

(venv-3.4.3) ubuntu@box260:~/code$ bazel test test/... --verbose_failures --sandbox_debug
INFO: Found 3 test targets...
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/external/org_jooq_jool/jar/BUILD:2:1: Extracting interface @org_jooq_jool//jar:jar failed: linux-sandbox failed: error executing command
  (cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/bazel-sandbox/60d55d3c-50a2-4bb9-a03e-8fb9ffa83e6b-1/execroot/code && \
  exec env - \
    PATH=/home/ubuntu/.yarn/bin:/opt/circleci/nodejs/v6.5.0/bin:/opt/google-cloud-sdk/bin:/home/ubuntu/virtualenvs/venv-3.4.3/bin:/opt/ghc/8.0.1/bin:/opt/cabal/1.24/bin:/opt/alex/3.1.7/bin:/opt/happy/1.19.5/bin:/home/ubuntu/.composer/vendor/bin:/opt/circleci/.phpenv/shims:/opt/circleci/.phpenv/bin:/opt/circleci/.rvm/gems/ruby-2.2.6/bin:/opt/circleci/.rvm/gems/ruby-2.2.6@global/bin:/opt/circleci/.rvm/rubies/ruby-2.2.6/bin:/home/ubuntu/.go_workspace/bin:/usr/local/go/bin:/opt/circleci/nodejs/v6.5.0/bin:/opt/circleci/.pyenv/shims:/opt/circleci/.pyenv/bin:/usr/local/android-sdk-linux/platform-tools:/usr/local/android-sdk-linux/tools:/usr/local/apache-maven/bin:/home/ubuntu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/gradle-1.10/bin:/opt/circleci/.rvm/bin:/opt/circleci/.rvm/bin \
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/execroot/code/_bin/linux-sandbox @/home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/bazel-sandbox/60d55d3c-50a2-4bb9-a03e-8fb9ffa83e6b-1/linux-sandbox.params -- external/bazel_tools/tools/jdk/ijar/ijar external/org_jooq_jool/jar/jool-0.9.12.jar bazel-out/local-fastbuild/genfiles/external/org_jooq_jool/jar/_ijar/jar/external/org_jooq_jool/jar/jool-0.9.12-ijar.jar).
src/main/tools/linux-sandbox.cc:183: linux-sandbox-pid1 has PID 45135
src/main/tools/linux-sandbox-pid1.cc:88: "mount": Permission denied
src/main/tools/linux-sandbox.cc:223: child exited normally with exitcode 1
ERROR: /home/ubuntu/code/BUILD:1:1 Extracting interface @org_jooq_jool//jar:jar failed: linux-sandbox failed: error executing command
  (cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/bazel-sandbox/60d55d3c-50a2-4bb9-a03e-8fb9ffa83e6b-1/execroot/code && \
  exec env - \
    PATH=/home/ubuntu/.yarn/bin:/opt/circleci/nodejs/v6.5.0/bin:/opt/google-cloud-sdk/bin:/home/ubuntu/virtualenvs/venv-3.4.3/bin:/opt/ghc/8.0.1/bin:/opt/cabal/1.24/bin:/opt/alex/3.1.7/bin:/opt/happy/1.19.5/bin:/home/ubuntu/.composer/vendor/bin:/opt/circleci/.phpenv/shims:/opt/circleci/.phpenv/bin:/opt/circleci/.rvm/gems/ruby-2.2.6/bin:/opt/circleci/.rvm/gems/ruby-2.2.6@global/bin:/opt/circleci/.rvm/rubies/ruby-2.2.6/bin:/home/ubuntu/.go_workspace/bin:/usr/local/go/bin:/opt/circleci/nodejs/v6.5.0/bin:/opt/circleci/.pyenv/shims:/opt/circleci/.pyenv/bin:/usr/local/android-sdk-linux/platform-tools:/usr/local/android-sdk-linux/tools:/usr/local/apache-maven/bin:/home/ubuntu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/gradle-1.10/bin:/opt/circleci/.rvm/bin:/opt/circleci/.rvm/bin \
  /home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/execroot/code/_bin/linux-sandbox @/home/ubuntu/.cache/bazel/_bazel_ubuntu/185255daeeca84642f8709521495e24f/bazel-sandbox/60d55d3c-50a2-4bb9-a03e-8fb9ffa83e6b-1/linux-sandbox.params -- external/bazel_tools/tools/jdk/ijar/ijar external/org_jooq_jool/jar/jool-0.9.12.jar bazel-out/local-fastbuild/genfiles/external/org_jooq_jool/jar/_ijar/jar/external/org_jooq_jool/jar/jool-0.9.12-ijar.jar).
INFO: Elapsed time: 7.713s, Critical Path: 2.15s

Executed 0 out of 3 tests: 1 fails to build and 2 were skipped.

@mratsim
Copy link

mratsim commented Feb 4, 2017

Small update. I tried deactivating AppArmor for my LXC container with
lxc.aa_profile = unconfined

I still get the Operation not permitted issue while building Bazel

@mratsim
Copy link

mratsim commented Feb 15, 2017

I manage to build bazel itself in a LXC container by deactivating the sandboxing altogether with:
--strategy=Genrule=standalone --spawn_strategy=standalone added to the bazel build line

@tsuri
Copy link

tsuri commented Feb 17, 2017

On my debian system I had to modify bazel as follow:
@@ -402,8 +404,9 @@ static void MakeFilesystemMostlyReadOnly() {
static void MountProc() {
// Mount a new proc on top of the old one, because the old one still refers to
// our parent PID namespace.

  • if (mount("proc", "proc", "proc", MS_NODEV | MS_NOEXEC | MS_NOSUID, NULL) <
  • if (mount("proc", "proc", "proc", MS_REC | MS_BIND | MS_NODEV | MS_NOEXEC | MS_NOSUID, NULL) <

but I don't know what are the genera implication of this nor how to check that is not breaking anything.
I'd appreciate if somebody familiar with sandboxing would take this and check, otherwise I'll try a PR over the weekend.

bazel-io pushed a commit that referenced this issue Mar 27, 2017
Try to run /bin/true as a test of whether the Linux sandbox works,
instead of just trying to create a bunch of namespaces as a proxy.

This helps resolve issues on Linux distros where the earlier check
worked, but then the sandbox ultimately failed due to other operations
being unsupported.

As an example, Debian Jessie and certain Docker versions seem to allow
the creation of PID namespaces, but forbid mounting a new proc on top of
/proc (see #1972). This resulted in Bazel thinking that sandboxing works
fine, when it actually didn't. The improved check correctly catches this
situation and disabled sandboxing.

--
PiperOrigin-RevId: 151116894
MOS_MIGRATED_REVID=151116894
@brian-peloton
Copy link
Contributor Author

I went to write up a patch doing that, and it turns out it doesn't actually work... You still end up with the wrong PIDs on /proc.

Turns out the root cause isn't the kernel version; it's actually what you have mounted in /proc. In my case, it's /proc/xen. containers/bubblewrap#134 and opencontainers/runc#252 both reference the same issue.

However, you can work around it by unmounting /proc/xen in a privileged mount namespace first:

brian[16259] dev-builder ~
$ sudo unshare --mount --propagation private
root[875] dev-builder /home2/brian
# umount /proc/xen
root[876] dev-builder /home2/brian
# su brian
brian[16264] dev-builder ~
$ unshare --fork --pid --mount --map-root-user
root[16264] dev-builder ~
# mount -t proc proc /proc

That workaround does require privileges, but you could in theory do it before spawning the login shell or something. I think I'm going to just unmount /proc/xen system-wide because it's for compatibility and it looks like my systems don't have anything, but there are options.

Given that it looks like this is a kernel/system issue and not really a Bazel issue, and c2d773e made it fail gracefully, I'm going to close this now. I'll send out the test case I wrote to catch /proc being wrong with @tsuri's idea to make it more obvious that it doesn't work if anybody else tries it in the future.

bazel-io pushed a commit that referenced this issue May 9, 2017
While investigating #1972, I wrote this test to evaluate a potential
solution. This test caught the fact that the solution didn't work, which
makes it valuable for future changes to the sandbox.

Change-Id: I435e9b9543374554c09d8d7c0918c24d9dc8f19d
PiperOrigin-RevId: 155500491
@alexeagle
Copy link
Contributor

Has anyone applied the workaround successfully? Say I start with
https://hub.docker.com/r/insready/bazel/
docker run -it --rm insready/bazel
I haven't been able to fix the /proc mountpoint so that bazel sandboxing works.

(It would be extra cool if the Bazel team maintained a docker image so it would be easy to run bazel builds on CI like Circle)

@davido
Copy link
Contributor

davido commented May 25, 2017

Yes, see my comment from "Dec 14, 2016": Workaround is to pass --priviledged option to docker command.

@alexeagle
Copy link
Contributor

I don't think that works in CI environments where you don't run the container yourself. See https://discuss.circleci.com/t/option-to-run-docker-with-privileged-on-circle-2-0/12377

@mattmoor
Copy link

@alexeagle The container builder team has gcr.io/cloud-builders/bazel.

@mattmoor
Copy link

@Ryang20718
Copy link

Ryang20718 commented Apr 13, 2023

Sorry to comment on this stale thread. But we hit the same issue of linux-sandbox being unavailable when running bazel inside a docker container. Root of the problem stems from Nvidia although.

Problem: Due to Nvidia Runtime Mounting Proc, when running bazel within a docker container, we hit

src/main/tools/linux-sandbox-pid1.cc:441: "mount": Operation not permitted

We see that there's a nested proc mount

unshare --mount --map-root-user --pid --fork
# mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /proc/driver/nvidia type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555,inode64)
proc on /proc/driver/nvidia/gpus/0000:b3:00.0 type proc (ro,nosuid,nodev,noexec,relatime)

Whilst I know this is nvidia problem and limited to local execution, it would be nice to be able to use linux-sandbox within a docker container w/ access to Nvidia runtime.

Proposal:
Applying the Recursive Bind option from @tsuri, we fix this issue #18069. Wondering if we can get someone to review this small patch 😅. Would greatly save us complexity from maintaining our own patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: sandboxing P2 We'll consider working on this in future. (Assignee optional) type: bug
Projects
None yet
Development

No branches or pull requests

10 participants