Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): can't attach program: invalid argument: unknown" (master, kernel 5.4, cgroup2) #3008

Closed
AkihiroSuda opened this issue Jun 8, 2021 · 8 comments · Fixed by #3009 or #4548

Comments

@AkihiroSuda
Copy link
Member

docker run does not work at all with runc 4d6b929 , cgroup v2, and kernel 5.4, due to the failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): can't attach program: invalid argument: unknown. error.

$ docker run hello-world
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): can't attach program: invalid argument: unknown.
ERRO[0000] error waiting for container: context canceled 
$ docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:38 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:50 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95+dev
  GitCommit:        v1.0.0-rc95-89-g4d6b9297
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 8
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: v1.0.0-rc95-89-g4d6b9297
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.4.0-74-generic
 Operating System: Ubuntu 20.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 1.941GiB
 Name: ubuntu-focal
 ID: IXOP:CWWS:M7ET:LOFO:QI2Q:7KE6:BEMB:3SYU:WFTP:GHJF:RUID:YA53
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Found in moby/moby#42450 , discussed in #2986

@AkihiroSuda
Copy link
Member Author

Probably haveBpfProgReplace() is not robust enough.
It should consistently return false for kernel < 5.6.

@AkihiroSuda
Copy link
Member Author

cilium/ebpf doesn't wrap error: https://github.com/cilium/ebpf/blob/a4ee356536f31c2c050d2348e256154295d54862/link/program.go#L72

So errors.Is(err, EINVAL) does not work as expected. I'll prepare a PR.

@cyphar
Copy link
Member

cyphar commented Jun 8, 2021

Yeah that explains it. I went through the source to figure out how they handle BPF_F_REPLACE not being supported, but somehow I missed that. We might need to use the syscall directly.

@cyphar
Copy link
Member

cyphar commented Jun 8, 2021

Turns out they used %w for most errors, they just missed a handful. I've sent a PR to fix that on their side.

cyphar added a commit to cyphar/ebpf that referenced this issue Jun 9, 2021
This allows callers to detect the underlying syscall error, which is
necessary for being able to implement safe fallbacks based on the error
during program loading and unloading. runc in particular needs this to
be able to implement BPF_F_REPLACE fallbacks correctly.

Ref: opencontainers/runc#3008
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
lmb pushed a commit to cilium/ebpf that referenced this issue Jun 9, 2021
This allows callers to detect the underlying syscall error, which is
necessary for being able to implement safe fallbacks based on the error
during program loading and unloading. runc in particular needs this to
be able to implement BPF_F_REPLACE fallbacks correctly.

Ref: opencontainers/runc#3008
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@jmason
Copy link

jmason commented Dec 5, 2024

This issue has just bit me on a system with a 5.4 kernel, using the released version of runc 1.2.2:

docker run hello-world
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): attach program: invalid argument: unknown.

sudo dpkg-query --listfiles containerd.io | grep runc
/usr/bin/containerd-shim-runc-v1
/usr/bin/containerd-shim-runc-v2
/usr/bin/runc

dpkg-query --show containerd.io
containerd.io   1.7.24-1

zless /usr/share/doc/containerd.io/changelog.Debian.gz

containerd.io (1.7.24-1) release; urgency=medium

  * Update containerd binary to v1.7.24
  * Update systemd unit to start containerd service after dbus.service
  * Update runc binary to v1.2.2

 -- Sebastiaan van Stijn <thajeztah@docker.com>  Thu, 21 Nov 2024 16:37:21 +0000

Downgrading to the prior version of containerd.io (therefore runc) fixes it again:

sudo apt-get reinstall containerd.io=1.7.23-1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be DOWNGRADED:
  containerd.io
0 upgraded, 0 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
Need to get 0 B/21.8 MB of archives.
After this operation, 126 kB disk space will be freed.
Do you want to continue? [Y/n] y
dpkg: warning: downgrading containerd.io from 1.7.24-1 to 1.7.23-1
(Reading database ... 50604 files and directories currently installed.)
Preparing to unpack .../containerd.io_1.7.23-1_armhf.deb ...
Unpacking containerd.io (1.7.23-1) over (1.7.24-1) ...
Setting up containerd.io (1.7.23-1) ...
Processing triggers for man-db (2.10.2-1) ...

runc --version
runc version 1.1.14
commit: v1.1.14-0-g2c9f560
spec: 1.0.2-dev
go: go1.22.9
libseccomp: 2.5.3

docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm32v7)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

@cyphar
Copy link
Member

cyphar commented Dec 6, 2024

Okay, it seems that in #4397 the update to use Anchors (cilium/ebpf@417f8a2) didn't update the usage in haveBpfProgReplace and so we have been getting a different error (because we've been passing an invalid flag) but that was masked by how we checked the error value. So we've ended up with a regression.

If you ran runc in debug mode, you'd probably see checking for BPF_F_REPLACE: got unexpected (not EBADF or EINVAL) error.

@kolyshkin
Copy link
Contributor

@jmason what distro are you using? Asking because we don't hit this issue in CI.

It seems that the only possible way to have that is Ubuntu 20.04 LTS with

  • switched to cgroup v2 explicitly;
  • the kernel not being upgraded to the HWE one (5.15).

@jmason
Copy link

jmason commented Dec 6, 2024

@kolyshkin it's an Ubuntu 22.04 LTS running on an Odroid SBC, using their vendor kernel (which is still 5.4.275-434), unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants